From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49B62C2D0DB for ; Wed, 29 Jan 2020 18:10:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0C478206D4 for ; Wed, 29 Jan 2020 18:10:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bC+SqKHh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C478206D4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ADB9D6B0279; Wed, 29 Jan 2020 13:10:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A64AB6B027A; Wed, 29 Jan 2020 13:10:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E1FD6B027B; Wed, 29 Jan 2020 13:10:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 6E6CA6B0279 for ; Wed, 29 Jan 2020 13:10:02 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 0F591180AD806 for ; Wed, 29 Jan 2020 18:10:02 +0000 (UTC) X-FDA: 76431460644.15.box84_3a71d0813950a X-HE-Tag: box84_3a71d0813950a X-Filterd-Recvd-Size: 9189 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Wed, 29 Jan 2020 18:10:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580321401; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Geb31uWbgXwC3jWQf/RNKNKcxkWnkejy/J9A900QU68=; b=bC+SqKHh1Vmn6gbxWSzWR0aJ1xXVjz9f77Qje+uWxu97K3q3lEv1oHTG+JZn1AcRjZjo4z BVgj0fCpJGaAsO+aaYlKEYwgGCjx8Y7jmXvNKaIFPRFUFxNuaIs6okso398pjlRfdi0ayV dlwdfpG3/dG2WZP1GI9F98mNuJRWWew= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-332-agbtWw-nMhWCDINyGHNWNA-1; Wed, 29 Jan 2020 13:09:44 -0500 Received: by mail-wr1-f72.google.com with SMTP id s13so235405wrb.21 for ; Wed, 29 Jan 2020 10:09:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=PuvkLv/oIw9Q/DG8YMeMuQ1BSWZb/tsSFT2ktsDil9M=; b=m+BP0/JxaWT66abD5hKNz0M2BM6Zvg87FC0UtnU0eDNfEL3F1AxeWKxN4c9gx9+rAS hg+K6WGMBOm557ge/TKGdyJfuXEWDVwgydDJBufRICqspt1rAk+Vp6+tMV9IKEFdG5p5 w7SCHfmAT9UHmVrcBRnCzJcFNDkNqO9NeJB/PvuPYW2A3JD9Y7bzCLIQ88QXk8fx9XdA QiPFWueY1oleWhut8oeYBs7XQsxSdiBlWsLSMp2WXc0CRSwB1XYW6aAQPH7WsxKtp1JL 222cl4pmAiKct061NJ+DphAgbz7JOpUc0mBLVNWP5H7EMVz08h1qKYdfOp4Rf3dOfpj6 Q5zw== X-Gm-Message-State: APjAAAUYFhWdapj78BdjfCglnL1tU/cgkI6h5kCErgn5ee/Q2pjxzluL bFqC1zqs/1DjpjBMH+X3zmKzh2FUFIwqK8X5r0AysFp6CTtO2b4c+BIE/ieDmsLsDTCTvRshT/R mccou7OelrPY= X-Received: by 2002:a1c:2089:: with SMTP id g131mr436990wmg.63.1580321382599; Wed, 29 Jan 2020 10:09:42 -0800 (PST) X-Google-Smtp-Source: APXvYqx1RpprGonPlEm6g7ZSIDmngGnsx8ltMy/qtsYuE7+BQkwzMwemGOvhlG6C+tZ+eVipbvkanw== X-Received: by 2002:a1c:2089:: with SMTP id g131mr436956wmg.63.1580321382243; Wed, 29 Jan 2020 10:09:42 -0800 (PST) Received: from dhcp-1-195.brq.redhat.com (nat-pool-brq-t.redhat.com. [213.175.37.10]) by smtp.googlemail.com with ESMTPSA id m7sm3695976wrr.40.2020.01.29.10.09.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jan 2020 10:09:41 -0800 (PST) From: Grzegorz Halat To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, ghalat@redhat.com, ssaner@redhat.com, atomlin@redhat.com, oleksandr@redhat.com, vbendel@redhat.com, kirill@shutemov.name, khlebnikov@yandex-team.ru, borntraeger@de.ibm.com, Andrew Morton , Iurii Zaikin , Kees Cook , Luis Chamberlain , Jonathan Corbet , Tetsuo Handa , Qian Cai Subject: [PATCH 1/1] mm: sysctl: add panic_on_inconsistent_mm sysctl Date: Wed, 29 Jan 2020 19:08:51 +0100 Message-Id: <20200129180851.551109-1-ghalat@redhat.com> X-Mailer: git-send-email 2.24.1 MIME-Version: 1.0 X-MC-Unique: agbtWw-nMhWCDINyGHNWNA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Memory management subsystem performs various checks at runtime, if an inconsistency is detected then such event is being logged and kernel continues to run. While debugging such problems it is helpful to collect memory dump as early as possible. Currently, there is no easy way to panic kernel when such error is detected. It was proposed[1] to panic the kernel if panic_on_oops is set but this approach was not accepted. One of alternative proposals was introduction of a new sysctl. Add a new sysctl - panic_on_inconsistent_mm. If the sysctl is set then the kernel will be crashed when an inconsistency is detected by memory management. This currently means panic when bad page or bad PTE is detected(this may be extended to other places in MM). Another use case of this sysctl may be in security-wise environments, it may be more desired to crash machine than continue to run with potentially damaged data structures. Changes since v1 [2]: - rename the sysctl to panic_on_inconsistent_mm - move the sysctl from kernel to vm table - print modules in print_bad_pte() only before calling panic [1] https://lore.kernel.org/linux-mm/1426495021-6408-1-git-send-email-bornt= raeger@de.ibm.com/ [2] https://lore.kernel.org/lkml/20200127101100.92588-1-ghalat@redhat.com/ Signed-off-by: Grzegorz Halat --- Documentation/admin-guide/sysctl/vm.rst | 14 ++++++++++++++ include/linux/kernel.h | 1 + kernel/sysctl.c | 9 +++++++++ mm/memory.c | 8 ++++++++ mm/page_alloc.c | 4 +++- 5 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index 64aeee1009ca..57f7926a64b8 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm: - overcommit_memory - overcommit_ratio - page-cluster +- panic_on_inconsistent_mm - panic_on_oom - percpu_pagelist_fraction - stat_interval @@ -741,6 +742,19 @@ extra faults and I/O delays for following faults if th= ey would have been part of that consecutive pages readahead would have brought in. =20 =20 +panic_on_inconsistent_mm +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Controls the kernel's behaviour when inconsistency is detected +by memory management code, for example bad page state or bad PTE. + +0: try to continue operation. + +1: panic immediately. + +The default value is 0. + + panic_on_oom =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 0d9db2a14f44..b3bd94c558ab 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -518,6 +518,7 @@ extern int oops_in_progress;=09=09/* If set, an oops, p= anic(), BUG() or die() is in extern int panic_timeout; extern unsigned long panic_print; extern int panic_on_oops; +extern int panic_on_inconsistent_mm; extern int panic_on_unrecovered_nmi; extern int panic_on_io_nmi; extern int panic_on_warn; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 70665934d53e..a9733311e3a1 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1303,6 +1303,15 @@ static struct ctl_table vm_table[] =3D { =09=09.extra1=09=09=3D SYSCTL_ZERO, =09=09.extra2=09=09=3D &two, =09}, +=09{ +=09=09.procname=09=3D "panic_on_inconsistent_mm", +=09=09.data=09=09=3D &panic_on_inconsistent_mm, +=09=09.maxlen=09=09=3D sizeof(int), +=09=09.mode=09=09=3D 0644, +=09=09.proc_handler=09=3D proc_dointvec_minmax, +=09=09.extra1=09=09=3D SYSCTL_ZERO, +=09=09.extra2=09=09=3D SYSCTL_ONE, +=09}, =09{ =09=09.procname=09=3D "panic_on_oom", =09=09.data=09=09=3D &sysctl_panic_on_oom, diff --git a/mm/memory.c b/mm/memory.c index 45442d9a4f52..b29a18077a6a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -71,6 +71,7 @@ #include #include #include +#include =20 #include =20 @@ -88,6 +89,8 @@ #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame fo= r last_cpupid. #endif =20 +int panic_on_inconsistent_mm __read_mostly; + #ifndef CONFIG_NEED_MULTIPLE_NODES /* use the per-pgdat data instead for discontigmem - mbligh */ unsigned long max_mapnr; @@ -543,6 +546,11 @@ static void print_bad_pte(struct vm_area_struct *vma, = unsigned long addr, =09=09 vma->vm_ops ? vma->vm_ops->fault : NULL, =09=09 vma->vm_file ? vma->vm_file->f_op->mmap : NULL, =09=09 mapping ? mapping->a_ops->readpage : NULL); + +=09if (panic_on_inconsistent_mm) { +=09=09print_modules(); +=09=09panic("Bad page map detected"); +=09} =09dump_stack(); =09add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d047bf7d8fd4..a20cd3ece5ba 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -643,9 +643,11 @@ static void bad_page(struct page *page, const char *re= ason, =09if (bad_flags) =09=09pr_alert("bad because of flags: %#lx(%pGp)\n", =09=09=09=09=09=09bad_flags, &bad_flags); -=09dump_page_owner(page); =20 +=09dump_page_owner(page); =09print_modules(); +=09if (panic_on_inconsistent_mm) +=09=09panic("Bad page state detected"); =09dump_stack(); out: =09/* Leave bad fields for debug, except PageBuddy could make trouble */ --=20 2.21.1