From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B189AC43331 for ; Fri, 27 Mar 2020 22:43:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63AB820663 for ; Fri, 27 Mar 2020 22:43:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hL948nHy" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63AB820663 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D51AB6B000E; Fri, 27 Mar 2020 18:43:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D01DA6B0032; Fri, 27 Mar 2020 18:43:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF0EA6B0036; Fri, 27 Mar 2020 18:43:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0047.hostedemail.com [216.40.44.47]) by kanga.kvack.org (Postfix) with ESMTP id A7A626B000E for ; Fri, 27 Mar 2020 18:43:00 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6B7FB442B for ; Fri, 27 Mar 2020 22:43:00 +0000 (UTC) X-FDA: 76642618920.20.ship71_8e9a5c90f7b04 X-HE-Tag: ship71_8e9a5c90f7b04 X-Filterd-Recvd-Size: 9249 Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [63.128.21.74]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 27 Mar 2020 22:42:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585348979; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bBbZPNJ/i4RYCMqeKuxEymz0jpD2JawoX4CdhlxNwTY=; b=hL948nHyPHKFztmBIKqOQHSQFGYuyareumMw9duq5EApE4twNDPYC6cWA+6W2ZlAfDFReH 9H2VwUnYnFNe5bC6j6VncgSiOXQ0OeeOMtBh34l4EWipKVQVlaMy1zH1uzXyTpxYZsNCXF Ou08+gDKLuG/912+iVB2cbyR3H6PfGI= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-140-zTobqHdHOt61YWa2Z26B6A-1; Fri, 27 Mar 2020 18:42:57 -0400 X-MC-Unique: zTobqHdHOt61YWa2Z26B6A-1 Received: by mail-wr1-f72.google.com with SMTP id b2so4810391wrq.8 for ; Fri, 27 Mar 2020 15:42:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=4Mzj3ADDwHK3OZxpDobYVftdFd+qF9R9aBxcCpPNr+U=; b=AJolLSCfRI+53kPemVynft2vGs4hgfiamtZC3JD4gZtbt5iiHwqr+6xc5+ujkLUMek dIXEVMmh3q62exUwDMAQ4RuYRQAPXpN3/6DFuUVufGjHvIDRe8L0drTkSy4UPgWgaSe5 bw6cnUSsu1yqH/OVg0HJoaBgSR2FBl9kb4cCZbL9+DRjDUj8EHVt6lDA6fZWQb1BSd4P ICNOncuMpY2cPM02ynz4QaHNehMz5gYm32h3j3IZMnQWVHn2ZOjNPYT7alv+TO2j1W2P UVv73m7JZqp79/sbtB6SpnHnaLEhKYAnhkk2y0+xM+pnA1UiIGcll3LHS3gZATZycx0u xOVw== X-Gm-Message-State: ANhLgQ0sH1DbdYfnlhE6gDMZgUB8xj40mvJe9OPqStWVrQIrDYqsNUfP y3Ihr3J6ZL+5lHMLylC2JifS2BQaoU5BD8uIadUrBhAMsjKHY6TyO73iRNaBKu/jaYhkSMb5cEu CzqklidljW+g= X-Received: by 2002:a1c:1d48:: with SMTP id d69mr967961wmd.166.1585348976108; Fri, 27 Mar 2020 15:42:56 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvht9CnqI023WuyvRzJ486mByWv2gyDxVbdP8TdugWtI1ObphZr39PiEAYB16jEmau7Kp4AJA== X-Received: by 2002:a1c:1d48:: with SMTP id d69mr967943wmd.166.1585348975798; Fri, 27 Mar 2020 15:42:55 -0700 (PDT) Received: from [192.168.3.122] (p5B0C6821.dip0.t-ipconnect.de. [91.12.104.33]) by smtp.gmail.com with ESMTPSA id 195sm10017222wmb.8.2020.03.27.15.42.54 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 27 Mar 2020 15:42:55 -0700 (PDT) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v1] drivers/base/memory.c: indicate all memory blocks as removable Date: Fri, 27 Mar 2020 23:42:54 +0100 Message-Id: <700D7668-8E47-4691-8E9F-97A544D660CE@redhat.com> References: Cc: David Hildenbrand , Michal Hocko , Linux Kernel Mailing List , Linux MM , Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Morton , powerpc-utils-devel@googlegroups.com, util-linux@vger.kernel.org, Badari Pulavarty , Nathan Fontenot , Robert Jennings , Heiko Carstens , Karel Zak , "Scargall, Steve" In-Reply-To: To: Dan Williams X-Mailer: iPhone Mail (17D50) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 27.03.2020 um 23:13 schrieb Dan Williams : >=20 > =EF=BB=BFOn Fri, Mar 27, 2020 at 9:50 AM David Hildenbrand wrote: >>=20 >>> On 27.03.20 17:28, Dan Williams wrote: >>> On Fri, Mar 27, 2020 at 2:00 AM David Hildenbrand wr= ote: >>>>=20 >>>> On 27.03.20 08:47, Michal Hocko wrote: >>>>> On Thu 26-03-20 23:24:08, Dan Williams wrote: >>>>> [...] >>>>>> David, Andrew, >>>>>>=20 >>>>>> I'd like to recommend this patch for -stable as it likely (test >>>>>> underway) solves this crash report from Steve: >>>>>>=20 >>>>>> [ 148.796036] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) >>>>>> [ 148.796074] ------------[ cut here ]------------ >>>>>> [ 148.796098] kernel BUG at include/linux/mm.h:1087! >>>>>> [ 148.796126] invalid opcode: 0000 [#1] SMP NOPTI >>>>>> [ 148.796146] CPU: 63 PID: 5471 Comm: lsmem Not tainted 5.5.10-200.= fc31.x8=3D >>>>>> 6_64+debug #1 >>>>>> [ 148.796173] Hardware name: Intel Corporation S2600WFD/S2600WFD, B= IOS SE5=3D >>>>>> C620.86B.02.01.0010.010620200716 01/06/2020 >>>>>> [ 148.796212] RIP: 0010:is_mem_section_removable+0x1a4/0x1b0 >>>>>> [ 148.796561] Call Trace: >>>>>> [ 148.796591] removable_show+0x6e/0xa0 >>>>>> [ 148.796608] dev_attr_show+0x19/0x40 >>>>>> [ 148.796625] sysfs_kf_seq_show+0xa9/0x100 >>>>>> [ 148.796640] seq_read+0xd5/0x450 >>>>>> [ 148.796657] vfs_read+0xc5/0x180 >>>>>> [ 148.796672] ksys_read+0x68/0xe0 >>>>>> [ 148.796688] do_syscall_64+0x5c/0xa0 >>>>>> [ 148.796704] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>>>> [ 148.796721] RIP: 0033:0x7f3ab1646412 >>>>>>=20 >>>>>> ...on a non-debug kernel it just crashes. >>>>>>=20 >>>>>> In this case lsmem is failing when reading memory96: >>>>>>=20 >>>>>> openat(3, "memory96/removable", O_RDONLY|O_CLOEXEC) =3D 4 >>>>>> fcntl(4, F_GETFL) =3D 0x8000 (flags O_RDONLY|O= _LARGEFILE) >>>>>> fstat(4, {st_mode=3DS_IFREG|0444, st_size=3D4096, ...}) =3D 0 >>>>>> read(4, ) =3D ? >>>>>> +++ killed by SIGSEGV +++ >>>>>> Segmentation fault (core dumped) >>>>>>=20 >>>>>> ...which is phys_index 0x60 =3D> memory address 0x3000000000 >>>>>>=20 >>>>>> On this platform that lands us here: >>>>>>=20 >>>>>> 100000000-303fffffff : System RAM >>>>>> 291f000000-291fe00f70 : Kernel code >>>>>> 2920000000-292051efff : Kernel rodata >>>>>> 2920600000-292093b0bf : Kernel data >>>>>> 29214f3000-2922dfffff : Kernel bss >>>>>> 3040000000-305fffffff : Reserved >>>>>> 3060000000-1aa5fffffff : Persistent Memory >>>>>=20 >>>>> OK, 2GB memblocks and that would mean [0x3000000000, 0x3080000000] >>>>>=20 >>>>>> ...where the last memory block of System RAM is shared with persiste= nt >>>>>> memory. I.e. the block is only partially online which means that >>>>>> page_to_nid() in is_mem_section_removable() will assert or crash for >>>>>> some of the offline pages in that block. >>>>>=20 >>>>> Yes, this patch is a simple workaround. Normal memory hotplug will no= t >>>>> blow up because it should be able to find out that test_pages_in_a_zo= ne >>>>> is false. Who knows how other potential pfn walkers handle that. >>>>=20 >>>> All other pfn walkers now correctly use pfn_to_online_page() - which >>>> will also result in false positives in this scenario and is still to b= e >>>> fixed by Dan IIRC. [1] >>>=20 >>> Sorry, it's been too long and this fell out of my cache. I also turned >>> away once the major fire in KVM was put out with special consideration >>> for for devmem pages. What's left these days? ...besides >>> removable_show()? >>=20 >> Essentially any pfn_to_online_page() is a candidate. >>=20 >> E.g., >>=20 >> mm/memory-failure.c:memory_failure() >>=20 >> is obviously broken (could be worked around) >=20 > Ooh, the current state looks worse than when I looked previously. I > wasn't copied on commit 96c804a6ae8c ("mm/memory-failure.c: don't > access uninitialized memmaps in memory_failure()"). That commit seems > to ensure the pmem errors in memory sections that overlap with > System-RAM are not handled. So that change looks broken to me. > Previously get_devpagemap() was sufficient protection. >=20 Well, it went in before we learned that pfn_to_online_page() is now broken = in corner cases since sub-section hotadd. >>=20 >> Also >>=20 >> mm/memory-failure.c:soft_offline_page() >>=20 >> is obviously broken. >=20 > How exactly? The soft_offline_page() callers seem to already account > for System-RAM vs devmem. Then my quick scan was maybe wrong :) >=20 >>=20 >>=20 >> Also set_zone_contiguous()->__pageblock_pfn_to_page() is broken, when it >> checks for "page_zone(start_page) !=3D zone" if the memmap contains garb= age. >>=20 >> And I only checked a handful of examples. >=20 > Ok, but as the first example shows in the absence of a problem report > these pre-emptive changes might make things worse so I don't think > it's as simple as go instrument all the pfn_to_online_page() users. >=20 Fixing pfn_to_online_page() is the right thing to do, not working around it= eventually having false positives IMHO.