From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83671C54E5D for ; Tue, 12 Mar 2024 16:48:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0501D6B027A; Tue, 12 Mar 2024 12:48:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0003A6B027B; Tue, 12 Mar 2024 12:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0AD76B027C; Tue, 12 Mar 2024 12:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CE55B6B027A for ; Tue, 12 Mar 2024 12:48:11 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A320916069A for ; Tue, 12 Mar 2024 16:48:11 +0000 (UTC) X-FDA: 81888969582.04.4F28A09 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) by imf29.hostedemail.com (Postfix) with ESMTP id BC4C9120011 for ; Tue, 12 Mar 2024 16:48:09 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fm3oP9GL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710262089; a=rsa-sha256; cv=none; b=2nzpp4HkJZy7ZXlHvQJhk2/wk996AdKV4ydXgMLHQy9ZMAWAegvoV4Z46bF2MrbwwCXtQX Hi3A5R0nhpUe8qw4xbaccEZO6hBycca3e5YDepTSvOyYw1CcwqB3qTWHYhEypuOiJegSOI czavOjwWH/qgNbdl5OPD54a+91lly54= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Fm3oP9GL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.128.48 as permitted sender) smtp.mailfrom=axelrasmussen@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710262089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X1JdD3SjakPudHQVJ76rnnq2H2jE37lLZ/0x/M6KH0I=; b=W7ilqHIYeV/El5IFlDnxD4j90IOeG71XK0H8NjPYqhbQ6zyvtpBcmoJGTtwasxreRtLze2 EteuRtl6sD8KYoyzQrmcdD+bkF3vY2DlPrA67Y/EZuCCOfnM37BXgiO4UeB56pjOYuDy+4 FRseAQ19NSeIk+IPRuXPeBME43TbiTc= Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-4132f37e0acso10562965e9.2 for ; Tue, 12 Mar 2024 09:48:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1710262088; x=1710866888; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X1JdD3SjakPudHQVJ76rnnq2H2jE37lLZ/0x/M6KH0I=; b=Fm3oP9GLxaAJmrsup3E3//NpcaANX8Buj6grLPu7cmm5SkgX//rdXWdajT5EJKERBC cxOg9hwMBLqmxGZOBkSXkR/3YTVZ7TxYF8K6SxGcqlkQ/ZZg2NLUzCH+Lj1BxRjTnm+X 21hgTs1izZ2n3yXy42ulzXKU3Zll57BNqOYmuchXIM/WBHFtoGbXPT0pocB8CrQo7i5s c4/CviY4YnbgNyVTNaLBUrTLnTSzmAPyTW5nhMbSuRK+SqkNWbW8+rzIQ6SA588CG+Jh XpU+ml0FWdck8Xq3gcwY8ZysoLngTdF2lLfj04gWrlSOzMt6IPzBY3hKJ5li5rfzfC+A BkEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710262088; x=1710866888; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X1JdD3SjakPudHQVJ76rnnq2H2jE37lLZ/0x/M6KH0I=; b=ZkG2BOpoT+29keSR3YkHVj8fNXpzSyd8AogEg5NjvbeJMSyHHpORPrKM7qXwEqL+VB eeuA808iGVmVtr742NoJR2MGCj5bzs/tyHG7sZcPTd+P210oeKNagJyOQTLScC9O6Fjl L9lvEMnwNGZTsQMc15pvERexnpJeG5uHN40rEl8sn6aAFb0J788JIffyeUdqYir649V8 1HwSfmgfVZDeQL7F/CKZ6u3a62QzCfilYOs/qdTpRu2NBw8FQElaOf5gDkemNbwt9Eev J2tiJwzrKJzQawuRCWTdqpkW2sCX/Jttx2SMRpJMJbVRrieJ8HqmVu62THxCjURIIUYU R5gw== X-Forwarded-Encrypted: i=1; AJvYcCUX7e813FFxu+m/B5EP35Bb4SiFSy8ASQAo4CGXsOBiQFtKjLr6FT4QGaHxMCGM4uRSYb3YMcb0VVpDnd72aIH5/OQ= X-Gm-Message-State: AOJu0Yw2aYEub5joO2payp1oQAS4hs0EkR1cS6FGfUhfdbZcQCWum+ln 1PvE3GJcDu8na8l5SItoyx+swzboquaO1TxyRoIvIq6wHGeuq7M9QBgNLs6kx2NhM0zwjtV0m3U 7LlUKpKtN548pI3B24rtTGjn0CAKcJBqeZFp2 X-Google-Smtp-Source: AGHT+IHz1t46LqyIdp7DJe0W007vNHqs968mX/S24f/EmXbCZ+L80gonmf0M4eO18CLGv4wGugu605YqTZHSHfDeU2Y= X-Received: by 2002:a05:600c:5251:b0:412:ebce:aa81 with SMTP id fc17-20020a05600c525100b00412ebceaa81mr7255060wmb.23.1710262088195; Tue, 12 Mar 2024 09:48:08 -0700 (PDT) MIME-Version: 1.0 References: <4a5c8d28-7f73-4c15-b288-641f0ccc91c2@redhat.com> In-Reply-To: From: Axel Rasmussen Date: Tue, 12 Mar 2024 09:47:31 -0700 Message-ID: Subject: Re: BUG selftests/mm] To: Peter Xu Cc: Jiaqi Yan , James Houghton , David Hildenbrand , Mirsad Todorovac , linux-mm@kvack.org, Andrew Morton , Shuah Khan , linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: BC4C9120011 X-Stat-Signature: yd96o5oc7pey9x5ceo447tmyah41erci X-HE-Tag: 1710262089-343653 X-HE-Meta: U2FsdGVkX1/TwtCC6Qc2v3xUSEgUGS647BFwAY032ME2ZWGYpTvLxamM+dlqAZBvVK7DZVKuVJeanjDyPQkODwZakmp7cOSKWbwza+f9O8xV9G1a98tCycE5plgITupTtLf50gIqGkm7kZyuJBTAVRfRKYVIFFr5Ed85+aiWKvrXGq9pays6E9T4Sy7Y+ievFlUUkRAAYfw5Vt8wLaO3VF9WhGDw1CzrlzFtuZSEvg7cCjh0tir80i+SbOiZe53xsTfEK3FOi82E83OSFAl3pTJ08d4BUBgiHtb33qWRI3viQL6HKkFsFE32Ntg0L7SBFvd3ZZHq1cs/q7aQHfQhwDvfRJjJBWsMR/L+J4CHUU9o9mEE6WLWkn36omgoA9KkB4QhjcU/TAUxlRShWrXUvQ5W1GQC6yLmBIjUWjHDNBeaOGbtQS62pPXf/oz4LTAtnIj8F97/wP+vtsAsbQ6kHgw7d7KGhM69yPKQThYtEYGgc8JxZ1BpBqngT32h429pZh8PMBV4YwCfdNg5wq03P7tiQV8VtEX58QYs6KUKm1BP2kM5q+1Smd79f83mvcU2Zd0LvIX13k1gCyKxE5pOtUThHNruY/51Q535v3UpqMvv2WAx6olbgR8bVceG24SUzV/ebtAy6LUSPIr9mNv/wLmkK2jbHLlNAgC6MadYuc43k3xpOXWH2wc6t9burznFHVafYS/RuXynxWeOhjeoM4AgYh1HKJilqxB9H46YK0BXAZbYe5a0U/q4WHTZtxNzkpFgFzKJLN/I6CCatT1vYdMdRVUzwo0Y2L+YMal6EUxhZpix6OGjcjLAioXNayCvuBhGYrTuv9OpSq8J7PfWUYVyNL1EdoWF6F1hzvSWf0r6onkSCYSM1vMTcjq7keKO8z/iKIAl2mqU0ePxh/qSdyMFEIVXN+EX3H3arf7W4hmgqPCaNG8l7X/TXi9i49xKjE4YsrVaVkjnv/dU9jB cenracsF qiTT9g35NdlVMqZHwDEXCuLGBsJwh32aHTkQuxfKHIlRXVV0MJNcpKdPG12n1uzbDTGviv2tRMhqVrPVSLhJzaJqWAw2Q3xSM7z/gmvCqlH5nu79pZyfcQAu9qUyRqpez2sHMv9RzDCX42HyszeRVSEBrvPz7h5i41Dvj7a5q2z9LAOdyXIMg+J0Hm6n5Bk7oWDSUIrPi4ZOWgnthvzU+zDJhq090G/yulT/lG+XIZ+ZHzLdkX0WGMd0TM3mgmLAvmofoi6mkNgrC2nwEJvf2BPbULA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 12, 2024 at 8:38=E2=80=AFAM Peter Xu wrote: > > On Mon, Mar 11, 2024 at 03:28:28PM -0700, Jiaqi Yan wrote: > > On Mon, Mar 11, 2024 at 2:27=E2=80=AFPM James Houghton wrote: > > > > > > On Mon, Mar 11, 2024 at 12:28=E2=80=AFPM Peter Xu = wrote: > > > > > > > > On Mon, Mar 11, 2024 at 11:59:59AM -0700, Axel Rasmussen wrote: > > > > > I'd prefer not to require root or CAP_SYS_ADMIN or similar for > > > > > UFFDIO_POISON, because those control access to lots more things > > > > > besides, which we don't necessarily want the process using UFFD t= o be > > > > > able to do. :/ > > > > > > I agree; UFFDIO_POISON should not require CAP_SYS_ADMIN. > > > > +1. > > > > > > > > > > > > > > > > > Ratelimiting seems fairly reasonable to me. I do see the concern = about > > > > > dropping some addresses though. > > > > > > > > Do you know how much could an admin rely on such addresses? How fr= equent > > > > would MCE generate normally in a sane system? > > > > > > I'm not sure about how much admins rely on the address themselves. +c= c > > > Jiaqi Yan > > > > I think admins mostly care about MCEs from **real** hardware. For > > example they may choose to perform some maintenance if the number of > > hardware DIMM errors, keyed by PFN, exceeds some threshold. And I > > think mcelog or /sys/devices/system/node/node${X}/memory_failure are > > better tools than dmesg. In the case all memory errors are emulated by > > hypervisor after a live migration, these dmesgs may confuse admins to > > think there is dimm error on host but actually it is not the case. In > > this sense, silencing these emulated by UFFDIO_POISON makes sense (if > > not too complicated to do). > > Now we have three types of such error: (1) PFN poisoned, (2) swapin error= , > (3) emulated. Both 1+2 should deserve a global message dump, while (3) > should be process-internal, and nobody else should need to care except th= e > process itself (via the signal + meta info). > > If we want to differenciate (2) v.s. (3), we may need 1 more pte marker b= it > to show whether such poison is "global" or "local" (while as of now 2+3 > shares the usage of the same PTE_MARKER_POISONED bit); a swapin error can > still be seen as a "global" error (instead of a mem error, it can be a di= sk > error, and the err msg still applies to it describing a VA corrupt). > Another VM_FAULT_* flag is also needed to reflect that locality, then > ignore a global broadcast for "local" poison faults. It's easy to implement, as long as folks aren't too offended by taking one more bit. :) I can send a patch for this on Monday if there are no objections. > > > > > SIGBUS (and logged "MCE: Killing %s:%d due to hardware memory > > corruption fault at %lx\n") emit by fault handler due to UFFDIO_POISON > > are less useful to admins AFAIK. They are for sure crucial to > > userspace / vmm / hypervisor, but the SIGBUS sent already contains the > > poisoned address (in si_addr from force_sig_mceerr). > > > > > > > > It's possible for a sane hypervisor dealing with a buggy guest / gues= t > > > userspace to trigger lots of these pr_errs. Consider the case where a > > > guest userspace uses HugeTLB-1G, finds poison (which HugeTLB used to > > > ignore), and then ignores SIGBUS. It will keep getting MCEs / > > > SIGBUSes. > > > > > > The sane hypervisor will use UFFDIO_POISON to prevent the guest from > > > re-accessing *real* poison, but we will still get the pr_err, and we > > > still keep injecting MCEs into the guest. We have observed scenarios > > > like this before. > > > > > > > > > > > > Perhaps we can mitigate that concern by defining our own ratelimi= t > > > > > interval/burst configuration? > > > > > > > > Any details? > > > > > > > > > Another idea would be to only ratelimit it if !CONFIG_DEBUG_VM or > > > > > similar. Not sure if that's considered valid or not. :) > > > > > > > > This, OTOH, sounds like an overkill.. > > > > > > > > I just checked again on the detail of ratelimit code, where we by d= efault > > > > it has: > > > > > > > > #define DEFAULT_RATELIMIT_INTERVAL (5 * HZ) > > > > #define DEFAULT_RATELIMIT_BURST 10 > > > > > > > > So it allows a 10 times burst rather than 2.. IIUC it means even if > > > > there're continous 10 MCEs it won't get suppressed, until the 11th = came, in > > > > 5 seconds interval. I think it means it's possibly even less of a = concern > > > > to directly use pr_err_ratelimited(). > > > > > > I'm okay with any rate limiting everyone agrees on. IMO, silencing > > > these pr_errs if they came from UFFDIO_POISON (or, perhaps, if they > > > did not come from real hardware MCE events) sounds like the most > > > correct thing to do, but I don't mind. Just don't make UFFDIO_POISON > > > require CAP_SYS_ADMIN. :) > > > > > > Thanks. > > > > -- > Peter Xu >