From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C13CC54E5D for ; Tue, 12 Mar 2024 15:38:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3A8B8D0055; Tue, 12 Mar 2024 11:38:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC2F68D0036; Tue, 12 Mar 2024 11:38:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93CB98D0055; Tue, 12 Mar 2024 11:38:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7E6BC8D0036 for ; Tue, 12 Mar 2024 11:38:34 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 34B8714079F for ; Tue, 12 Mar 2024 15:38:34 +0000 (UTC) X-FDA: 81888794148.28.8A48516 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 7916E40020 for ; Tue, 12 Mar 2024 15:38:30 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A0z8OKEB; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710257912; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4NO2ng5ojcuuq2VJkd8yzdtqpo1AFnaPqHeS1TAsaTo=; b=Zl8qUk5FqtnFvTdBbZdWbWdvQaGO1EnxsdpHacBOZmaQPh/bvWB3wyqcaNnGpJCKxrdbt2 eZhxHYQS9iMfeyLbXCMe4hNOUNNbJa8RHsA3NfDk3NF1kmfhVzWT1FDa933+NUenJMl87T 6bZ9Fa7IEB7zGD5ySFwt7odB4V/sD/E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710257912; a=rsa-sha256; cv=none; b=yDeNo+lwkCqifQpBnehxy8mkdOswlUjVVDsVpOcGTlOCUETEKbkD1MiEdtjabv33xdUu77 +XMvqKrMyYZPZwXjdyU0Ybl/bUInUvs0Ez+E0ncTUVM7YMixDvmqf8+qUQte089fbCLw9m 3WOy+9LZoRunzlT1rHKuuRlbDrtn+B8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=A0z8OKEB; spf=pass (imf17.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710257909; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4NO2ng5ojcuuq2VJkd8yzdtqpo1AFnaPqHeS1TAsaTo=; b=A0z8OKEBhlZG60K0oE83e/t0n2z7AT7OUgn94Q2VyOMwxIVdJbNdTyL/L+BkEQZJJ5UG38 40HNfe+u4rODluyUTbSbypYx7wjBoaQCKeEfg3+MWEocPppYc6d7Fgws0lmIde/RP+WygK S3ojY94EFXJe/6v9fbXN7unwo9ENo+A= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-49-T2A7rm9MNJGJoRtb7OFsug-1; Tue, 12 Mar 2024 11:38:28 -0400 X-MC-Unique: T2A7rm9MNJGJoRtb7OFsug-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-690d3f5af86so6762316d6.0 for ; Tue, 12 Mar 2024 08:38:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710257907; x=1710862707; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4NO2ng5ojcuuq2VJkd8yzdtqpo1AFnaPqHeS1TAsaTo=; b=kwAEtfQajW8tbVslAgOs+LN9onZzKO0b+MZd2l4mNYGJV4XNIwRAMAaBO+78fFA1rP KcQ/NL/G0WuC8MqdH02IO06CVnWVYpY1JQARh119XzqyJDWoLo3LUFS/rEV/c2Tw2fVY NVr/vvAs2JOXb4dB/pagwCVKQIJG/vKwgR/L/vWnoDn8HQovkKISsnCZ1hejz8qxxMhG /KKKuKYKB8lzm5d83TrfqePHtOqhiakRgsZO5yxB6Qr1f+2JiuBtPiN72+h73XD0DZ1l N9nhQb5AE62dpQP/MLEsU6IovIOhNh0Cu/LE3eZ0inZqOaQ3MOtu9MzRU+ogTFJVebpd rkvw== X-Forwarded-Encrypted: i=1; AJvYcCVCs48jXV/ubMhyW987+AWNQk6K/yZCqH07wPQ5xzpkhKYz4zGtsc3jJBUOf0r1XAkHge3CBD1uzyppYKm6FvUXJuM= X-Gm-Message-State: AOJu0YzroqqnkFEYkcgjY+NBq5Bp0Jxf6zu9Jd5Pe7c9eIJAoVxxysBP c6Ae6iglubgRPBvO0svDriEFyFimmsqDJo7LfxbWchxL3fSDPB9XmclrVs0Ft1sMxoy5VD1sx0A eI05TH8F+3MBm+mjc0Acg6vveTJoUNtIDFzCjdRXJU3a26Wdx X-Received: by 2002:a05:620a:3710:b0:788:5c6b:16bf with SMTP id de16-20020a05620a371000b007885c6b16bfmr13059536qkb.0.1710257907599; Tue, 12 Mar 2024 08:38:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE3+TlXwXVB3tqs6EXZGDEfptiQhjVAgR1EIKYOT2l1whs/cgzV0RS0fKe3uH059/AIE8NOpg== X-Received: by 2002:a05:620a:3710:b0:788:5c6b:16bf with SMTP id de16-20020a05620a371000b007885c6b16bfmr13059511qkb.0.1710257907234; Tue, 12 Mar 2024 08:38:27 -0700 (PDT) Received: from x1n (cpe688f2e2cb7c3-cm688f2e2cb7c0.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id c18-20020a05620a0cf200b007871bac855fsm3792631qkj.47.2024.03.12.08.38.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Mar 2024 08:38:26 -0700 (PDT) Date: Tue, 12 Mar 2024 11:38:24 -0400 From: Peter Xu To: Jiaqi Yan Cc: James Houghton , Axel Rasmussen , David Hildenbrand , Mirsad Todorovac , linux-mm@kvack.org, Andrew Morton , Shuah Khan , linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: BUG selftests/mm] Message-ID: References: <4a5c8d28-7f73-4c15-b288-641f0ccc91c2@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Stat-Signature: gm8z3aujdm4p3zx3y6mc67p867rid53x X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7916E40020 X-Rspam-User: X-HE-Tag: 1710257910-891743 X-HE-Meta: U2FsdGVkX1/HFIc8rqZYhkv5CQAaDMxxwtj+YcUWoEK3HEwJPYwSD5vvS2nm9kcSdXJw/RAIYl103OM/cl4isEkzwmUsv+Liczj+rja/iTV9yZqbfp4CBC31nraSISJswjTIXY5X9SIGpmiPgIKePq94uLzk4WWjJrqJ4oDYUnXsJnOp+EmxSAXA1PEjI/CVQQTjJg5X7PmNqvhcFKtY2/wa0YOWk6w44i/jB3Wh4rknnBNo8MVwbDj1sIJgJt3ZCve6+oXqxc9PKqkJb1k7nCeehlj+PqNoEKXKWLgWb+lErGIYAkZBE3F8vmGikKZzJUvXzJOrA+iTJNviSstUL9wV13qS2o9eRNODk42gLeHrjqx8ze+6SLJXL38Zl4wL5yCQezkV4i8OLatmdtWp6x9PaHyURtqH3XT1uEi7BwXv4CnMV5lEND5m802SczOOKfWLdl373gz8CCPGO3fC0WkqlS1uFEzm9OdB5ERC/t7By1Djtw4e/kVsjsx8Y/wMv8EaIpN2eIM7zTfh7mUOdrcL7Ke9AUp5X1Qng/4ygMoHl/o51a/0nz0KhHfZsdrUNaToarWe5bQlPx6zSaxzau9Pir4N+H4X3esLmPV3iZjT/hl2EVzwCdU/KFEoaKAQcvk1sUUM1nAdZ9Zw6gNxGrvtJJEJRv5h8KKgeLMdcriIPDFHMDv7pLT39/7WC0CChQNbBPIGIqfwIsw+e1qV+MHdsb5+RHEIRd3+5M3s3FJbu994UGqymgqJZfDQ5QYfQ1B7pF0Vo6Nauall9v+r4+KAdQmHZoAFpiREtZrwnkybUtXprCewfDicgE82MXjwRjnciLOQ7RqVTofCjHx1HWq2aRWHopoiOvZAK20Pmxh18vT4pKHyUms9E2G4SrxS/q1U168yyqmv14atAXhdli9aVU3/l7OcbdqPnLkT+ff1PSf0Rcg+mA4noJ2vNIGxbfuIsWg16EPEnfz61tt on3Tybjm OeLVspPwwgkS04ih3/T8qq1ETG3ENiRYPaxiuGlID2vTapnag1K9RnBMRNSwGjzLlcFSZvBJySu8GgAnsl6V9NBVrH2mi0w1ulsDmK2xepKY+IyWBWRtNJ1TdSQJa371mCBx2sBr7Tg4MNAZR5uAtculVfPBH3vHC8C5M2RNaMMcsqok611xuPssGXzwnIgrdynymji03bgn7ubN/BKKyXRY5swfnmSIGzX1PxtlSwlX9MP+6NNJgSiSTl+2YmFfgNTw/lKo2mIVOVKM1PwDvkvB+KrEfYz7nxG0gzyPeU6LXmVi9w9zcUfViSIs3tJtgGDdru8oXM1e2WLDBccqugmKYn3LSIucLnxdYjPemhpoQSD1Rhvd1L2x5sFGBOOYVhUpjmhl6rSOndItWwtMtJIifIXhyG4XaUPrjVaTUDETNB6D/t786YC3Omw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote: > On Mon, Mar 11, 2024 at 2:27 PM James Houghton wrote: > > > > On Mon, Mar 11, 2024 at 12:28 PM Peter Xu wrote: > > > > > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote: > > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for > > > > UFFDIO_POISON, because those control access to lots more things > > > > besides, which we don't necessarily want the process using UFFD to be > > > > able to do. :/ > > > > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN. > > +1. > > > > > > > > > > > > Ratelimiting seems fairly reasonable to me. I do see the concern about > > > > dropping some addresses though. > > > > > > Do you know how much could an admin rely on such addresses? How frequent > > > would MCE generate normally in a sane system? > > > > I'm not sure about how much admins rely on the address themselves. +cc > > Jiaqi Yan > > I think admins mostly care about MCEs from **real** hardware. For > example they may choose to perform some maintenance if the number of > hardware DIMM errors, keyed by PFN, exceeds some threshold. And I > think mcelog or /sys/devices/system/node/node${X}/memory_failure are > better tools than dmesg. In the case all memory errors are emulated by > hypervisor after a live migration, these dmesgs may confuse admins to > think there is dimm error on host but actually it is not the case. In > this sense, silencing these emulated by UFFDIO_POISON makes sense (if > not too complicated to do). Now we have three types of such error: (1) PFN poisoned, (2) swapin error, (3) emulated. Both 1+2 should deserve a global message dump, while (3) should be process-internal, and nobody else should need to care except the process itself (via the signal + meta info). If we want to differenciate (2) v.s. (3), we may need 1 more pte marker bit to show whether such poison is "global" or "local" (while as of now 2+3 shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can still be seen as a "global" error (instead of a mem error, it can be a disk error, and the err msg still applies to it describing a VA corrupt). Another VM_FAULT_* flag is also needed to reflect that locality, then ignore a global broadcast for "local" poison faults. > > SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory > corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON > are less useful to admins AFAIK. They are for sure crucial to > userspace / vmm / hypervisor, but the SIGBUS sent already contains the > poisoned address (in si_addr from force_sig_mceerr). > > > > > It's possible for a sane hypervisor dealing with a buggy guest / guest > > userspace to trigger lots of these pr_errs. Consider the case where a > > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to > > ignore), and then ignores SIGBUS. It will keep getting MCEs / > > SIGBUSes. > > > > The sane hypervisor will use UFFDIO_POISON to prevent the guest from > > re-accessing *real* poison, but we will still get the pr_err, and we > > still keep injecting MCEs into the guest. We have observed scenarios > > like this before. > > > > > > > > > Perhaps we can mitigate that concern by defining our own ratelimit > > > > interval/burst configuration? > > > > > > Any details? > > > > > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or > > > > similar. Not sure if that's considered valid or not. :) > > > > > > This, OTOH, sounds like an overkill.. > > > > > > I just checked again on the detail of ratelimit code, where we by default > > > it has: > > > > > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ) > > > #define DEFAULT_RATELIMIT_BURST 10 > > > > > > So it allows a 10 times burst rather than 2.. IIUC it means even if > > > there're continous 10 MCEs it won't get suppressed, until the 11th came, in > > > 5 seconds interval. I think it means it's possibly even less of a concern > > > to directly use pr_err_ratelimited(). > > > > I'm okay with any rate limiting everyone agrees on. IMO, silencing > > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they > > did not come from real hardware MCE events) sounds like the most > > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON > > require CAP_SYS_ADMIN. :) > > > > Thanks. > -- Peter Xu