From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07D11C369AB for ; Thu, 24 Apr 2025 14:54:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB7B36B00B5; Thu, 24 Apr 2025 10:54:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E66DE6B00B6; Thu, 24 Apr 2025 10:54:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB8866B00B7; Thu, 24 Apr 2025 10:54:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A55196B00B5 for ; Thu, 24 Apr 2025 10:54:44 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7A0AC1CCDD1 for ; Thu, 24 Apr 2025 14:54:44 +0000 (UTC) X-FDA: 83369234088.20.3E679EF Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) by imf16.hostedemail.com (Postfix) with ESMTP id 6D54718000B for ; Thu, 24 Apr 2025 14:54:42 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=AcIKQQpu; dmarc=none; spf=pass (imf16.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.175 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745506482; a=rsa-sha256; cv=none; b=YIYA0ROHkswweV1iHDb48NUtpttwRXK2V4Ksc/IdEhM51eQXmd4XZ1UDwmErA+h1rrVDLn pyPurdVSHb7v9A15b/OEggjJajeT6U4Pb+ei8nZc7N4Ei6K1UDSiVHsAeQIQWZfwNQXV96 us/iZ+e1gXrjrm5INPgafmN+RLlfafY= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=AcIKQQpu; dmarc=none; spf=pass (imf16.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.175 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745506482; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mtwm22V0E9YgKk14Iu2N3ZRqCyrswQ3Mf9l5CjPwuFo=; b=eXxdsUt2lZJ9kAKlYCDgop0EHcLxJFSZQiXMF+FrFi9zRB5r9ojOSnZUpDq+h3yYOqdqJ9 OQhENNpBBWl7L4LCfX0D3ee1xFHoA91U+LAO/XxfdlyfjOziBTaWmc2tw/m923L6hJ3MCr rPVCtfFfYRxMDWIc59UFxjETD3p+K4g= Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-3d8fc9dbce4so7607855ab.0 for ; Thu, 24 Apr 2025 07:54:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745506481; x=1746111281; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=mtwm22V0E9YgKk14Iu2N3ZRqCyrswQ3Mf9l5CjPwuFo=; b=AcIKQQpupitNJxJLSvFSbb8hiOGJVrdwI3Hgpa8lubrb3iFE7h23oCxm3mYVes3PwT pc03PTmGAt6vWSJC2dz5N3nph5uukFL7zlNViGHMZPHjlBifg/H+/N8Yo2aNo8rWRff9 91YZnjQZULmoMu8L9QfjllTstlHBVvQJO67q9A0kVD8KG2o4lkmhjlDhNNKHOLHv49kd MAXZpGyL4lB54Y5iz6SsM8IGO/hrsoh8FaQmdKdZAflwr/Nl/YPkzHFEb3aG8D1z2057 d9tO/W23FbYiGmh7l0Ars7ZTyjY58sWSGBxeakbKttfU7ezDLXhE4vMqPoL07EShL5g/ l2cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745506481; x=1746111281; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mtwm22V0E9YgKk14Iu2N3ZRqCyrswQ3Mf9l5CjPwuFo=; b=WUkr6cYRwf1fYgpIBohacR5OECCmYhH5hoeJZaZSiq8RuGocjH2WJdqsLOYE64opj2 bQNUhNnnZFgGd61ow1MBSbUL1ZBug6mit3XX7pVYgwVvcz808N8QpVpQS6tWXLz25dak T53OCesMZ9o5Py8siVZOGXWGHC+E+lkULvQrFZ0MW25vfxt4m9EBtE/KYz+3a8gkfhN2 Y5Xky/nf5lwVEdszmLPWdbuDPtC4xq/NJUCsUZOUlQqdtPFo3e0tS5vj3uqj29OlQOKb ARB7ie1EJhPx3dF//FtgoWDuZCNUoY4d15KUaVPyhltBe6HE92ZbzQmdjXdFqntuULkc /zlg== X-Forwarded-Encrypted: i=1; AJvYcCU7k7acxGihUL9gRPI2/KxtILUgX3eFx5Kwrbq1XBatf0B0e2kfLBO+O0q5UKc/c6SkjymWjyNkdQ==@kvack.org X-Gm-Message-State: AOJu0YxYq5YchwLIHu5XemyczOLS+SGZTmD7Y3snaF5tb8UO+4cRo0jv l6F4wlx0uHfBHnM2mwM5QNLEL54d1DJya8pETgQVSmNrbsO95qUn5Hd+Y/7pgq4= X-Gm-Gg: ASbGncvtv1y0OF0TPebXPOCfXOsZCsNgI5A5xZEglN8yV9/cQAE8x9Q3BVRI84R6FSP zmjXucIJD+VwWuL8xJg1ac1A2FzzeZs20OLtsC6mEJBMn9QY3ieUWxnrhkP1YQfFHcNlJN06Wbj TWaOsn6E6rFSEoAyfc+2giKYJTdZY55RDYpeXJpfIdI3ulh/BNL+0hz3yOLt/TVDF1Nef9bqr57 EwNyajAZuhgu7VVX3mTtKHS+KBhW5HLBq+8pyqfk8PYw569nfdsG4OxflkS0ctCei9s0FuyRA/j szWE5++Mi0+cOkDu6mt3PH/Jt+ZPhh4UdxKP X-Google-Smtp-Source: AGHT+IFlY6RNDCjYHuqZzxsvpc7at9t7XjV3+z6ZydlrNnyfCHFkRzw3zaVy0fy7L9HyCTpuDTpaxw== X-Received: by 2002:a05:6e02:3a06:b0:3d3:de5f:af25 with SMTP id e9e14a558f8ab-3d93135f15emr30993625ab.0.1745506481269; Thu, 24 Apr 2025 07:54:41 -0700 (PDT) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f824ba0ec4sm305526173.113.2025.04.24.07.54.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Apr 2025 07:54:40 -0700 (PDT) Message-ID: Date: Thu, 24 Apr 2025 08:54:40 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/userfaultfd: prevent busy looping for tasks with signals pending To: Johannes Weiner Cc: Andrew Morton , Peter Xu , "linux-fsdevel@vger.kernel.org" , Linux-MM References: <27c3a7f5-aad8-4f2a-a66e-ff5ae98f31eb@kernel.dk> <20250424140344.GA840@cmpxchg.org> From: Jens Axboe Content-Language: en-US In-Reply-To: <20250424140344.GA840@cmpxchg.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 6D54718000B X-Rspamd-Server: rspam04 X-Stat-Signature: i6bq5qamxxaewftqxp8pfqwawpzsni4y X-HE-Tag: 1745506482-430676 X-HE-Meta: U2FsdGVkX19qQ2aVDQS4T4OGaWQ9eEJ2hED/AGf/KKI4/Yq6sqgMoFWPj2rUM2ODDCzE24Z+mA89GgkZzkhF4mHicZXKgrf8uBIpBGW9c0SeCETsjJYIkLeHJXAEU+xibYhkmShTlG9RDIzG1XlFRsbe2K0wwfhfMImol/bJ8Uqq5ql0GWHr5MXsJEyvzDT3gWjEc1V+qoTJE4sUd2oQlz3HN64F3AuoAKf++Y4pjorQJxqMZyqUNljnsFtwK/NCrLaLIhLOZYhKMnXwzeQbhVoODShIaUv9PZeC4U5XtIUUhOtE+I+cb6jjzBYXJqTRDSC4zQv2gc1d5FPH3uk39uYxwYObUVhTBZynk/QF8oerdt/8d93k1r5RZ4/9XNhNetHfgub8wPsvFkJBTTFQphaVd0i+oOF8CtvJzk9SjbE3l1rdM5s+MUB3Q13xCERJBPOxke9uf7fJOZby88YjNkkeR34uOnTj8VcXQm2VjQLJviK7vZpF857GRemt/jeJa24J6wO0I2OKbxei+6f+Vc1eiTP9yn5wBAbZBadhbr1lgRU/8NZVLyhT7xkYCc79w+yVlW+BdZduV4l+8JK8jUWvvsYdZkQ2I5jd9o8ClJVUh5esXVUhbwS+3ssGlwpfUQJrIX/BxRKRfg/jeM6kaO2Hm9TzgD+ilrIkIbPUI6GwbQK+lJ5f2CdLn8VwkuBX1WxHYwOLwdWqMNIFDRVVZRq+/UgP1RgdDK3NHuTTt1bRH+OrufDnL88OQqLtcMfL5BHIlJ5OdxZAbxA9mIUhy1YYIKum3MV6ZEOYQFVZ5IIP80tEtl6jPMta//k+hozPQurVROd0zH+aSW3+8wUPYSgyvLQTar/hFOu+guCqQGJerWG3jXlmQQBzyFxfcunsLgaa04TmZXucOEw2SPC3PdL3Z9mIc/vzIk3bLCoJRZswY4gVGkIRrhYySXW14dKa7m3FZ8BX7p7QgfRmWow B452T4Y/ ot6eZ5dk+IS9YlwKgOvMicyib2NgYQ+ANol6yFt0ghfU/YRZEz7jSvV8L2E/faYZO/QEEVeghoOMbN3in89ttJ7sT8+pz6LunlZ3VwmtIRfumyxpSa1v95aheoNaW1Y/yyxQud3x3ZqH4EGhEwsXb259liERTftfrLDk/Y0iR9xF9Pm9NV+3Utlnb+0Z99H1DuDoWh1+7XCvf90EfZymwa/dBcJCAARIPBWOG8RXkVnCkTvAeOuNejNvlvnfMFRsunGMW85HJP4nGx8u9PUBgcItMNSDT5n+0558jQBapG0k4yTntdvns6Ocg7zqAX+NICU1SPjfUm16VpFt6okU4UoOtpE0Poxchz+dsBB2qra6L1EMDdYnOqn7Jyfxl5aaYxpupwHrH7pvGKmqp1oAAnX9x+HR87X1lwMlNCYDlzqZh8nGQcnmkophPZkKKh7qEcfkzTZ0r48J/bJkFDrU1ezSuIomSxq3noAc6yFgArEN0YEvwebg7zVi6tardLFylu+5IrUyGzI+udAk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/24/25 8:03 AM, Johannes Weiner wrote: > On Wed, Apr 23, 2025 at 05:37:06PM -0600, Jens Axboe wrote: >> userfaultfd may use interruptible sleeps to wait on userspace filling >> a page fault, which works fine if the task can be reliably put to >> sleeping waiting for that. However, if the task has a normal (ie >> non-fatal) signal pending, then TASK_INTERRUPTIBLE sleep will simply >> cause schedule() to be a no-op. >> >> For a task that registers a page with userfaultfd and then proceeds >> to do a write from it, if that task also has a signal pending then >> it'll essentially busy loop from do_page_fault() -> handle_userfault() >> until that fault has been filled. Normally it'd be expected that the >> task would sleep until that happens. Here's a trace from an application >> doing just that: >> >> handle_userfault+0x4b8/0xa00 (P) >> hugetlb_fault+0xe24/0x1060 >> handle_mm_fault+0x2bc/0x318 >> do_page_fault+0x1e8/0x6f0 > > Makes sense. There is a fault_signal_pending() check before retrying: > > static inline bool fault_signal_pending(vm_fault_t fault_flags, > struct pt_regs *regs) > { > return unlikely((fault_flags & VM_FAULT_RETRY) && > (fatal_signal_pending(current) || > (user_mode(regs) && signal_pending(current)))); > } > > Since it's an in-kernel fault, and the signal is non-fatal, it won't > stop looping until the fault is handled. > > This in itself seems a bit sketchy. You have to hope there is no > dependency between handling the signal -> handling the fault inside > the userspace components. Indeed... But that's generic userfaultfd sketchiness, not really related to this patch. > >> do_translation_fault+0x9c/0xd0 >> do_mem_abort+0x44/0xa0 >> el1_abort+0x3c/0x68 >> el1h_64_sync_handler+0xd4/0x100 >> el1h_64_sync+0x6c/0x70 >> fault_in_readable+0x74/0x108 (P) >> iomap_file_buffered_write+0x14c/0x438 >> blkdev_write_iter+0x1a8/0x340 >> vfs_write+0x20c/0x348 >> ksys_write+0x64/0x108 >> __arm64_sys_write+0x1c/0x38 >> >> where the task is looping with 100% CPU time in the above mentioned >> fault path. >> >> Since it's impossible to handle signals, or other conditions like >> TIF_NOTIFY_SIGNAL that also prevents interruptible sleeping, from the >> fault path, use TASK_UNINTERRUPTIBLE with a short timeout even for vmf >> modes that would normally ask for INTERRUPTIBLE or KILLABLE sleep. Fatal >> signals will still be handled by the caller, and the timeout is short >> enough to hopefully not cause any issues. If this is the first invocation >> of this fault, eg FAULT_FLAG_TRIED isn't set, then the normal sleep mode >> is used. >> >> Cc: stable@vger.kernel.org >> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization") > > When this patch was first introduced, VM_FAULT_RETRY would work only > once. The second try would have FAULT_FLAG_ALLOW_RETRY cleared, > causing handle_userfault() to return VM_SIGBUS, which would bubble > through the fixup table (kernel fault), -EFAULT from > iomap_file_buffered_write() and unwind the kernel stack this way. > > So I'm thinking this is the more likely commit for Fixes: and stable: > > commit 4064b982706375025628094e51d11cf1a958a5d3 > Author: Peter Xu > Date: Wed Apr 1 21:08:45 2020 -0700 > > mm: allow VM_FAULT_RETRY for multiple times Thanks for checking that - yep that sounds fine to me, we can adjust the fixes tag appropriately. >> Reported-by: Zhiwei Jiang >> Link: https://lore.kernel.org/io-uring/20250422162913.1242057-1-qq282012236@gmail.com/ >> Signed-off-by: Jens Axboe > > Acked-by: Johannes Weiner Thanks! -- Jens Axboe