From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F69EC369AB for ; Thu, 24 Apr 2025 19:20:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69FBF6B00C4; Thu, 24 Apr 2025 15:20:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6502B6B00C6; Thu, 24 Apr 2025 15:20:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A1C26B00C9; Thu, 24 Apr 2025 15:20:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 28A456B00C4 for ; Thu, 24 Apr 2025 15:20:51 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0AE12C1B10 for ; Thu, 24 Apr 2025 19:20:51 +0000 (UTC) X-FDA: 83369904702.13.5DF2895 Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) by imf16.hostedemail.com (Postfix) with ESMTP id F397018000B for ; Thu, 24 Apr 2025 19:20:48 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=gy8dtwtL; dmarc=none; spf=pass (imf16.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.170 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745522449; a=rsa-sha256; cv=none; b=vPttRTurkSjvLdubW0myU0Laojmyf4X0c3H6rM0nG5+nFFYppDMip/CCFw1yKd9FAlvj4F G3PUnKb0el5HFwCmdtfgH85D9qHN2cTx24rNoaw2JW2G+qm6uBwrz8GHCNwnKOiBiKlQpm wnbV3j7TMbonzWTQfubMDBm1sJosJck= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=gy8dtwtL; dmarc=none; spf=pass (imf16.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.170 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745522449; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rD8PIgzSnv6Hr57RD+kfCsXuxQQBab077mkrI7EUcGc=; b=rpRNqDtNvODDjwERMYglFmMnbJrlYtLtwR/Qcf8Xt9X2/zfHFsKsXVDHmfRvY71fOrd+Wy /1ymkCMbtZ9MNaBUI/sidFUIeZsEU87pMm/3p3oR+6LJuiFBV3cmYApYXxT44dDye+8O07 rN5KI3GcwT1cs0Uig+czB8Z7fZ2f/MA= Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-3d81ca1d436so11395855ab.2 for ; Thu, 24 Apr 2025 12:20:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745522448; x=1746127248; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=rD8PIgzSnv6Hr57RD+kfCsXuxQQBab077mkrI7EUcGc=; b=gy8dtwtL3Qi6R+C1zTKu+Gc0ntZO2SSVKGSlHOo8H1kuXvanb1Za/fIuSzk0zysB8Z oc9WPfok3c6vf2enRqcRX/dwVwNWPsqcYZANLauj1sBELWrQaYFHJK5AMnM0hrGrl2Nc gD3OklegNF/m8EVHpkuENgR6wm/itokiZsm4ym0tutYDfT61SHcFL0ozQQhl8yYsqnTA kwPFxOCWJWMLQHVlfSz7ASet8Td/LcUOuj5l5Ehg0tyw5g/weXHf6SV2i2vWyEWOAg6F np5vr0rTFjIKn5kOj0MEXGNTpcy759B653BDL1HDHnxUem7cH3Ps7EhEKY7cCpf17xSX +s7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745522448; x=1746127248; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rD8PIgzSnv6Hr57RD+kfCsXuxQQBab077mkrI7EUcGc=; b=OtcQyhU5734cw/PSyOVNKTsNMEC/M2rGFbJH6eFqNFf8KdDhdzfFIsGJw4iA+GQ6qI 3AWHuwdQG9BqZ7hOxThaR0Ak6ZtfNotgjpRcIvyFIVl2xXCgKUZwTyFZF5H/9bDYPZTK jCeJ8uLgS4Ab/MZlTP1tljrV5+d55DqeQ9uLB7UqLVus0PTmVJdqNgfgYiOAt1CgDzR4 osUH09v4BUeDYoUXMn8zN0LZUXy3JyHt1RNmIKjY+c3FrLXnPBxB6BDG5tPpDAwBOQUu 8worDVJLZaLM81lnFLAIgaCyuS4IOjrBH25SipKgdpKGZZ4OOJUeXnm/wtwRvfUgKw6G 0PsA== X-Forwarded-Encrypted: i=1; AJvYcCW931Y0UMK4HWAFUlhZk7VzwcbpqzIAxWC6O/pGJ1H1BLoWOMIQDVIaBJP5GLgiwPVrb5c/lJAbNg==@kvack.org X-Gm-Message-State: AOJu0YxrX8xRFF+LqneEtjZuMp0KFctwSs5/YKZqJplO9xfXOxP8NBV7 2JGq8Mp3pbiKxpzgCDHaavkjn0IjW5MxWZLg+mkTtQyY4yH+43Flvmt7zPjUzAw= X-Gm-Gg: ASbGnct7GMG01dT3Y02mZ7KEz/GpgNsbo+q3v3XGETchknCo6lYaNvHUzP+Thix+9O4 yV3jkEFCoG8sgirVOdErm3fm+8rcgeGtCLq4Xc3Obv/tlagASs9QZ9QIpmbyOesWDen/YWdeRVF AGXaZ+Ivdb14rotgrvO1OoiGShLyKPz4CcRnwOifXRqxE93sbjPNb+kmTGEnW7wS2Qi0jxwqWrr 80Rl+GLakPxWp/EapbAvhx0UtqGHi2GrsPwTXXNnZIg5LqIGTvdpEV2Mq+PygwA9qOTcgTkLCh9 biYcXsYThq1wEVqcmG1qFuNLjtPL7M8g10gW X-Google-Smtp-Source: AGHT+IHVl4amRel6YSSwPx/KPht2X/bb6Vi1Bxk4v4wCQkjiFLbqAu4usChLz8EaH9vw3O8F+FnQFQ== X-Received: by 2002:a05:6e02:1d9c:b0:3d8:1d34:4edf with SMTP id e9e14a558f8ab-3d93041e568mr45684925ab.15.1745522447914; Thu, 24 Apr 2025 12:20:47 -0700 (PDT) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f824bca172sm397570173.143.2025.04.24.12.20.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Apr 2025 12:20:47 -0700 (PDT) Message-ID: <26a0a28c-197f-4d0b-ad58-c003d72b1700@kernel.dk> Date: Thu, 24 Apr 2025 13:20:46 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending To: Peter Xu Cc: Johannes Weiner , Andrew Morton , "linux-fsdevel@vger.kernel.org" , Linux-MM References: <27c3a7f5-aad8-4f2a-a66e-ff5ae98f31eb@kernel.dk> <20250424140344.GA840@cmpxchg.org> <86e2e26e-e939-4c45-879c-5021473cfb5a@kernel.dk> From: Jens Axboe Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: F397018000B X-Stat-Signature: aatpfcn6jcduep1e1gxmc44zcsusz9y3 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1745522448-716915 X-HE-Meta: U2FsdGVkX1/T3Z4RWQSM5oRkjRD4pODzSXrnByuH1EEBTW8Z48Na9NfnqCrnXhNKHtcknP7BHjUKA+IYfYo22ITPXDcZJ1LAcgiFkunE8n+2qYSSRU+6LqPfQPVm7ZUmLf54yQKgIJcjcx9ts8loqkgQCNJr7ozZ7IVolBY6DVZBHj6VA+hFoAoi5+CEwmz+BsR86E8PauBnKpi2FZ01B3dNMijG7O3J80vgkFc0bVscGxmPRgEuzPekKZ23ZI39CyIIMQh1MKjWwl5Rt3SDuKZghN9cWzvMSnjAnivUSlV9vMZ2bdhOwS4qlvgQ60YNGGoj7EEvUCyGQrABYrek2dJTHIzE0iF/73T7lE+V/UrPqX/nlBpssoiYQodwV98vVPqA9TdzKOhoAx7dIkBCCr59ENt/jEPmuuwWKqID/CdVraNhgAcVX190YZWxHhK8rH2/6fML1+jZzOVcjVKrwFaY7Wwl7EqeLOXm7ZnNyHOis2yjgUd0W3Psfxr7RNQ3LXUIK2Z19X71GypVwDnkJwQoStpNFrZCLoPxftpfzxzDd9z+6MVgwmUB5uXBZQ/0ncpe0L1RifcFegbxL3gIbV8ytC8R8lXL9fSNdI5f52rs9wAeDwJh7fRm92NuFpAWz2pJsklb52YEodunB29OU0yQy6E/vj/t47vOtAI38wwzb2SPTUUm3ClC6averjyQYcesiSEkj/WIrxqZMnB4NCfRESZLa0RzxPlvJRsiWIIW0RLL1yQ3iCJjv7FndeVVJHZnyHt/EnXfT1/Q/ImkM7qlyiR2kEdyKsFg0AATj/zGP9cStjOmWCg1PTWYTjTklphz4nZThfpu+8sriv2AJX6BW8eoEdGzVUtm+YXg2wMriCOJp7JeTlzuTwhtDE4CTWDA95UQcOwgHVZb4dkGQNQ4+vMfmn1d8pS8+isHo0zfxSYDja3fJTR2kOciKGsnZXrXIQLs0ahLutnT1+g 4vQy2rME EcWXh7LbbtxiNNgwx3jKYztAdwdjJuIUu9rNO1Ptgl7M3tu+YOMpm29uGC4MjRG0cIcCQ50AG0S7ecUzmQhewA3T89UKsDg5t0VuX6E8WCeApmfEMGGr5LZ+RfNvGtYN+JfCOJjZf027WdRj+C8FYvboxU58nMGxHVazt/9mS6hQo/+w8VdOmUoWrVvWYH7KVgwp3ECFiD6BMe3RS3VLdk95OlTvab4tE2T+pO3oDHvp5MeFDutwarba8FkL0dttvyQHtM9c99IlrkEVJNNQzG6IUeUTBcwSb3m3CRFIHW+74ZQlKYyjfrNktYpKTGmUQ8UlRAB/hSHufNaKsEeagCRRluPa+EPRzbwkXm/lGzleANogAeyJMh1e8IqXLIm3a9oHWGW3zs66kCGA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/24/25 1:13 PM, Peter Xu wrote: (skipping to this bit as I think we're mostly in agreement on the above) >>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>> index 296d294142c8..fa721525d93a 100644 >>> --- a/arch/x86/mm/fault.c >>> +++ b/arch/x86/mm/fault.c >>> @@ -1300,9 +1300,14 @@ void do_user_addr_fault(struct pt_regs *regs, >>> * We set FAULT_FLAG_USER based on the register state, not >>> * based on X86_PF_USER. User space accesses that cause >>> * system page faults are still user accesses. >>> + * >>> + * When we're in user mode, allow fast response on non-fatal >>> + * signals. Do not set this in kernel mode faults because normally >>> + * a kernel fault means the fault must be resolved anyway before >>> + * going back to userspace. >>> */ >>> if (user_mode(regs)) >>> - flags |= FAULT_FLAG_USER; >>> + flags |= FAULT_FLAG_USER | FAULT_FLAG_INTERRUPTIBLE; >>> >>> #ifdef CONFIG_X86_64 >>> /* >>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>> index 9b701cfbef22..a80f3f609b37 100644 >>> --- a/include/linux/mm.h >>> +++ b/include/linux/mm.h >>> @@ -487,8 +487,7 @@ extern unsigned int kobjsize(const void *objp); >>> * arch-specific page fault handlers. >>> */ >>> #define FAULT_FLAG_DEFAULT (FAULT_FLAG_ALLOW_RETRY | \ >>> - FAULT_FLAG_KILLABLE | \ >>> - FAULT_FLAG_INTERRUPTIBLE) >>> + FAULT_FLAG_KILLABLE) >>> ===8<=== >>> >>> That also kind of matches with what we do with fault_signal_pending(). >>> Would it make sense? >> >> I don't think doing a non-bounded non-interruptible sleep for a >> condition that may never resolve (eg userfaultfd never fills the fault) >> is a good idea. What happens if the condition never becomes true? You > > If page fault is never going to be resolved, normally we sigkill the > program as it can't move any further with no way to resolve the page fault. > > But yeah that's based on the fact sigkill will work first.. Yep >> can't even kill the task at that point... Generally UNINTERRUPTIBLE >> sleep should only be used if it's a bounded wait. >> >> For example, if I ran my previous write(2) reproducer here and the task >> got killed or exited before the userfaultfd fills the fault, then you'd >> have the task stuck in 'D' forever. Can't be killed, can't get >> reclaimed. >> >> In other words, this won't work. > > .. Would you help explain why it didn't work even for SIGKILL? Above will > still set FAULT_FLAG_KILLABLE, hence I thought SIGKILL would always work > regardless. > > For such kernel user page access, IIUC it should respond to SIGKILL in > handle_userfault(), then fault_signal_pending() would trap the SIGKILL this > time -> going kernel fixups. Then the upper stack should get -EFAULT in the > exception fixup path. > > I could have missed something.. It won't work because sending the signal will not wake the process in question as it's sleeping uninterruptibly, forever. My looping approach still works for fatal signals as we abort the loop every now and then, hence we know it won't be stuck forever. But if you don't have a timeout on that uninterruptible sleep, it's not waking from being sent a signal alone. Example: axboe@m2max-kvm ~> sudo ./tufd got buf 0xffff89800000 child will write Page fault flags = 0; address = ffff89800000 wait on child fish: Job 1, 'sudo ./tufd' terminated by signal SIGKILL (Forced quit) meanwhile in ps: root 837 837 0.0 2 0.0 14628 1220 ? Dl 12:37 0:00 ./tufd root 837 838 0.0 2 0.0 14628 1220 ? Sl 12:37 0:00 ./tufd -- Jens Axboe