From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B655C3DA4A for ; Tue, 6 Aug 2024 03:10:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E2D46B009B; Mon, 5 Aug 2024 23:10:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 193476B009D; Mon, 5 Aug 2024 23:10:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05A536B009F; Mon, 5 Aug 2024 23:10:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DA5E46B009B for ; Mon, 5 Aug 2024 23:10:47 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8D000C1C9A for ; Tue, 6 Aug 2024 03:10:47 +0000 (UTC) X-FDA: 82420343334.19.2D47467 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf29.hostedemail.com (Postfix) with ESMTP id 1A1C0120008 for ; Tue, 6 Aug 2024 03:10:44 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=N6cmA6fh; spf=pass (imf29.hostedemail.com: domain of xieyongji@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=xieyongji@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722913815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=U4HEFD8CUVMLfWTgw9z7V6Ux1UCzXOA+F012FnD1b5w=; b=yU2Wu5F98/IHdkGZmrnJ0t+149tsZcLCeXsZ2QPTLWBkcOANrVo+m5LZb1lMhjoQjbRCd4 5Vqu/wq4/Zaw6xDgCltCfOaXMUPgOKg9JotMPFG5jQ7Zzts0IE6j8iQmZycCCCqkeJEhZl peRbzx2UC6qOjf4hgaSKaOCYhZRtXjc= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=N6cmA6fh; spf=pass (imf29.hostedemail.com: domain of xieyongji@bytedance.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=xieyongji@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722913815; a=rsa-sha256; cv=none; b=C9YHzxsizDmDJzTXK6Q18YLWHcVFTH/H+BQ4ODxqTqB5CeUjwIXjEEPkRt7vcqMdt2E+Ye pauM+hsOCz7Sl09tJBEJ89JyXMaz4GNIEezBb0S4SJvkDlEISX1LK1oZ8psl7COZqk/uk6 GPAdgC4vrF6taDaEKIINwUONDCIgqRc= Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-2cf98ba0559so209843a91.2 for ; Mon, 05 Aug 2024 20:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1722913844; x=1723518644; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=U4HEFD8CUVMLfWTgw9z7V6Ux1UCzXOA+F012FnD1b5w=; b=N6cmA6fhjKUAJzehagDnqi0vYIekUkoqp6cfXmOoo+tCxBQEj5dnn8A+p/f5rHq5D3 qdp9RQiOvVaL1MUGi3EW9EodRoOIt1JhBZG/QhnF2quTGnIvPsSJTSeo/LLXZ0ArqeXC CbmAc9tN+crF/mQreU7KipZqzZ2Oa+nk2T5xcWAeKYrlUCsDsKFPY6mLXYvOpZZxdb9a CXLWElmqGK8zkhOwK3aG9MWCaoE3BG07pEryEmvZh/jYnzl6+BKfqDunXzFEhtIAGZIC FYadWsHimiweZ9cjTTT1IMHQHKiFXP7XWIa44xDIg1c9ByxtuZgL7XTWa0KZc4ZtJ3bU /7XA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722913844; x=1723518644; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U4HEFD8CUVMLfWTgw9z7V6Ux1UCzXOA+F012FnD1b5w=; b=HxBIJX1nIaFWG+gA/0f/dt5cjqS+maqq47r4j4YBbVxdDYvNKi5/Nv9yTAolPczfC7 PLe25dwOWDgdDCASyqDX90vbMXeEKtx1FgbNsPhwfOECM544k9F21mSjpd+C5u2eGjWJ myINWeDcrsBuBJx1YatKVGmnmiJAiQ/YCh31mRiSFuU+IFGrHaz/s3brtl9kIyu/EjgW 1gpzAL+WXL4ORf/fM1aj3Q6O6xKOpUQEeuRyXCgs83/8y7gLuC3ToHFLtU8Sor/jreyR Efjo9SxT5I/LYloC5OvFg+XS5PxPx2v8nKX5Tpsl0m8y5qF1XhT+JouUF+iUBghwoGKh 1gOA== X-Forwarded-Encrypted: i=1; AJvYcCVOQzK1t5eZCkjcOiiblqpBvZCMwwvC8KEJ1y88Re36Icmt5gj6opFcL2LtMo+LefIKMDqI7avgZhEtepQobwv04WY= X-Gm-Message-State: AOJu0YyQXLb+qb/MJ5Nu1jnaI0JLVgdajXLCUJ2IMg0Ksh6I3uSVpPOY wqsB/wSXpyVjzmaK94Fgxx+r2YRZgRgnx1qy6rohmQseJ1jaZ0vhZXks1MBSYfLl5il3pasqtPf 2C2LRLidFk3d6FiIb9iQiVKya9aWdH+4gZt3f X-Google-Smtp-Source: AGHT+IHTYparmt/OTUThIPTlzIvOrqD9cz4WdKpTshz9lzOsNEUt309813jePKD3C5KsZ33asBZLz3EEOnXCpOLA/34= X-Received: by 2002:a17:90a:680d:b0:2c9:a56b:8db6 with SMTP id 98e67ed59e1d1-2cff9559125mr12349178a91.37.1722913843669; Mon, 05 Aug 2024 20:10:43 -0700 (PDT) MIME-Version: 1.0 References: <20240805082106.65847-1-jasowang@redhat.com> In-Reply-To: From: Yongji Xie Date: Tue, 6 Aug 2024 11:10:32 +0800 Message-ID: Subject: Re: [PATCH] vduse: avoid using __GFP_NOFAIL To: Jason Wang Cc: Maxime Coquelin , Xuan Zhuo , "Michael S. Tsirkin" , Eugenio Perez Martin , virtualization@lists.linux.dev, linux-kernel , 21cnbao@gmail.com, penguin-kernel@i-love.sakura.ne.jp, linux-mm@kvack.org, Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1A1C0120008 X-Stat-Signature: dj4pj3ihhncuy648z84d6kmqunwstjar X-HE-Tag: 1722913844-992161 X-HE-Meta: U2FsdGVkX1/mLrczGjTugyEn/9PsY6zPzY6zlqEq03TldVAu2NwlEZOb2rcZSSmM9vxzQAPY9Od/ZveTzbGV4Dd9rUHZrJ6Tn251UBV5UXiu0idrhx1O5s5dmUJmnYMJQs8oDlDaf5csOiXadY1smGYDhyBy7ykFaFHS2Jll1ZJ64y9cQjWumiIQmROoGUxJuVuBbjCeSUOHow72BHDwbfhhG//x29iYnAaYI1bMx7gLv5Sp1HRWlyfr/4mTUxidmqkxviyvpvpf6eliIf97UO678mJAyPiDB3o9gc8o8rEooBdpNS7x5w2lDNw0tavDhsn0i8id4HPifnDe9a1sfe/HoHKoAiv9wPlo8pxgxPu0lbTwr/3AST3582ddbIxuTAtyVqXmTQZ9qvP2/GL1H83GH9ogjugeoFeBRKCimkKDL6Z7dkLrvGgmyyKEKqJ9V8gtYRqdFAYh2YuPI3FKuiNZJtZIa1ItOUF6/2QkNQdqm7wjPWBVW4e6aUsDcP0ph1u+xagR2MLz60botKYU3bTz/mEOycuWcOQq5wqZIFzxYg2acBgbAn3d0NAp2EQM1pR/q3CyLG+iKN3d2iddOePUAYKIrDC9pW2eQLcRRsA4vDVbTFer4cGR6sK9iDX/MUKyJZVpEWNZjhE3BN4YctggrmnTxOxoGQB8Yj8zYRQTavlnN/DSsbYnzyEX9BnP0T9R4xIkHCMh/NYGH2xYiL3FR6USJ7D1y+d0ZJUekRG5IT+gGUOj7aIuuM/+Mp99DOlXpebdmGhE/GXGlcASgGhE7fwavyq3H2XMjPxL5x2Uxplz5x55Memj9HvenECq16F+1o1jKbQYx8QRClaTrJxyoff5llnmrWJ5QWfIt9tlpLooC+kreKMemnTxTGEgSb5BoSpJpN927r9iWuV+GsjFauko7olL6fn6cFM7y32xDC9U5Xhtas0nDRw+J/bOsvXNNEIEvmjEwLZW3gv z3/KheqD LzjSmC2k4rJdP8hVMTanKyDYX7xYoRIzFRX0uq4+z4MuKOGMjl++EAeA4Z5vRtFtgL5lIDUN6jyT6LOXKjM7gxnn0UCj/yC+pYLDc5q2S6dutO6gytj83VWqq5dyRyk+nk6wh1zR0VxoTqqIX37EFbY50tGOgAragKGxAl2XPBiSpXxvfvJwOPZgSQzG8STg58ARfG6NhXMVx8ZuSTdrYyJ4wDdm1A5+YCekG6/FPcPgvR/XZAvzm/gMU4qONuAvKV+WmatkQj2PVDButYCpuMo4Pd04oDpmUEX+4LtCd4NiSc03AIPKy9P4CztLz8/8toH2F X-Bogosity: Ham, tests=bogofilter, spamicity=0.000021, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Aug 6, 2024 at 10:28=E2=80=AFAM Jason Wang wr= ote: > > On Mon, Aug 5, 2024 at 6:42=E2=80=AFPM Yongji Xie wrote: > > > > On Mon, Aug 5, 2024 at 4:24=E2=80=AFPM Jason Wang = wrote: > > > > > > On Mon, Aug 5, 2024 at 4:21=E2=80=AFPM Jason Wang wrote: > > > > > > > > Barry said [1]: > > > > > > > > """ > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > loop within non-sleepable contexts. > > > > ""=E2=80=9C > > > > > > > > Unfortuantely, we do that under read lock. A possible way to fix th= at > > > > is to move the pages allocation out of the lock into the caller, bu= t > > > > having to allocate a huge number of pages and auxiliary page array > > > > seems to be problematic as well per Tetsuon [2]: > > > > > > > > """ > > > > You should implement proper error handling instead of using > > > > __GFP_NOFAIL if count can become large. > > > > """ > > > > > > > > I think the problem is it's hard to do the error handling in > > fops->release() currently. > > vduse_dev_dereg_umem() should be the same, it's very hard to allow it to = fail. > > > > > So can we temporarily hold the user page refcount, and release it when > > vduse_dev_open()/vduse_domain_release() is executed. The kernel page > > allocation and memcpy can be done in vduse_dev_open() which allows > > some error handling. > > Just to make sure I understand this, the free is probably not the big > issue but the allocation itself. > Yes, so defer the allocation might be a solution. > And if we do the memcpy() in open(), it seems to be a subtle userspace > noticeable change? (Or I don't get how copying in vduse_dev_open() can > help here). > Maybe we don't need to do the copy in open(). We can hold the user page refcount until the inflight I/O is completed. That means the allocation of new kernel pages can be done in vduse_domain_map_bounce_page() and the release of old user pages can be done in vduse_domain_unmap_bounce_page(). Of course, we still have a copy (old user page -> new user spage) if the daemon calls vduse_dev_reg_umem() again. Thanks, Yongji