From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B26E8C3DA64 for ; Wed, 31 Jul 2024 04:13:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B88F6B0089; Wed, 31 Jul 2024 00:13:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4686F6B008A; Wed, 31 Jul 2024 00:13:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 309726B008C; Wed, 31 Jul 2024 00:13:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 11E5D6B0089 for ; Wed, 31 Jul 2024 00:13:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AD1361601D6 for ; Wed, 31 Jul 2024 04:13:56 +0000 (UTC) X-FDA: 82398729672.26.043E616 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 7D5DEA0011 for ; Wed, 31 Jul 2024 04:13:54 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AHafXmA1; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of jasowang@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=jasowang@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722399189; a=rsa-sha256; cv=none; b=hrDduOnmRLmJW/3UPTp3uVkK49jnhziOAXhLDUMmI0XNTZSoCVp2Mt1rgCD5jxfRJc9Wxm yhU46BIxptksZyl3iVaX1K57T7dd4GG7jYyJbCRu6D11u02KXX1o14Ihz5T3lTmogK3gu2 x5SWKTgJ2wqCFM1KKXAUilKN47M2j2E= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AHafXmA1; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf15.hostedemail.com: domain of jasowang@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=jasowang@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722399189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g7fduLws3eq5qh3TUq/RoZAYJZPIhkYXrAv/fjZLiu4=; b=xGZMqubwWsTsFXnYAaTr0x99mX0RmzEc3Iq+B3YkgFNm7Up2ucO7k7Pc4tWnExSUW9vPm2 AjZpIzESmrUI7MPWtOX1RJDH5xTx7fVco9iTpmkAPED0Un1e57Exk1ke22AJzk5d1iNHJW 5N7z+8ZwKO1RlVA6tBEuA/zxfw/hIik= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1722399233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g7fduLws3eq5qh3TUq/RoZAYJZPIhkYXrAv/fjZLiu4=; b=AHafXmA1x/QHKKrZGe+EgzGPxkGgBe/weD6YJwKVwNRlD+GkLuXPAswmky+7655/UULJz6 ekUYaoq4t5B/SNvdD+gmE1/WFC6Jc60Qzp2Scl0aC/xzI07GuJ+rBTBVr/OSdAaVPtSY0Z 8n/b2ll4WxpHltVx300ej5trPjh7Sxg= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-425-sIGBh72CNRezXM7TEtW50w-1; Wed, 31 Jul 2024 00:13:52 -0400 X-MC-Unique: sIGBh72CNRezXM7TEtW50w-1 Received: by mail-pj1-f71.google.com with SMTP id 98e67ed59e1d1-2cd2c7904dcso5288184a91.0 for ; Tue, 30 Jul 2024 21:13:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722399231; x=1723004031; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g7fduLws3eq5qh3TUq/RoZAYJZPIhkYXrAv/fjZLiu4=; b=DsvZp13DXGztwZvud5+8KYr+7qJ7uEDGicedZjL9/jlUtYKzpo7hITaw0swbWwSiIE n/vMVKwYK7RsK+VSVlYldwvcpxS8LlkABBpcN5XPf+D8SxeRWWde43F5vgF5G8Rvwn5d uWB/xD5ErD0S2Apw8pjse953Zvug8pyiVKzpY8CUpSbX+pSmWigluLoFRysIHEe3N4EH CHbbdo8jJurW9LKVaTtUGyzo0fQ6Ed7+4iS8nDctK2MuHnvZ/Pj0zwRifh2KHGuOCg33 tX97i0GAyHyvHeBFgRgbQdAWXrEGy6LguVslfOlDgT5pHGlu/ki4WITzKY+hddwAZpnY OkNg== X-Forwarded-Encrypted: i=1; AJvYcCX/hmPtb5Ai3Ep7NSvgWNbj4ngdnxRozS0NF30iBGZe17mR7+GrB3o20cgYSd+T3CJTRgkCBCn8mMajXvkSZolGXvk= X-Gm-Message-State: AOJu0YyHquYjv6Uz46s5eEEccMqXJV3AbNGUkR4paXMJQIUqz8fRLMHL rmwoVurzX1mno93+x+2W5vHSpG+LWQUerFSWbjVGdpdNUf/pgx0Uwg4LbHrB/rlIHwl3MiUDOJs knByL5t/XPFh13KtxAtifwr+DukNNJHUxG8o+36Z5K3G+H8N6tnCJBw3bRCwXl3NndpG86kQ4JP VsYDQ6t39Kl6Hb9CBX+Qb1LzA= X-Received: by 2002:a17:90b:793:b0:2c9:7616:dec7 with SMTP id 98e67ed59e1d1-2cf7e1b9439mr11648381a91.6.1722399230675; Tue, 30 Jul 2024 21:13:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE+7eCv/z138G2h+rrUTN4HYLOmXdtxjb+D37a7vm7eDMexVsIFaScsFTYQKKPDOXufrasT6Bh/IVsUTE9N4UQ= X-Received: by 2002:a17:90b:793:b0:2c9:7616:dec7 with SMTP id 98e67ed59e1d1-2cf7e1b9439mr11648360a91.6.1722399230105; Tue, 30 Jul 2024 21:13:50 -0700 (PDT) MIME-Version: 1.0 References: <20240731000155.109583-1-21cnbao@gmail.com> <20240731000155.109583-2-21cnbao@gmail.com> In-Reply-To: From: Jason Wang Date: Wed, 31 Jul 2024 12:13:39 +0800 Message-ID: Subject: Re: [PATCH RFT v2 1/4] vpda: try to fix the potential crash due to misusing __GFP_NOFAIL To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 42.hyeyoo@gmail.com, cl@linux.com, hailong.liu@oppo.com, hch@infradead.org, iamjoonsoo.kim@lge.com, lstoakes@gmail.com, mhocko@suse.com, penberg@kernel.org, rientjes@google.com, roman.gushchin@linux.dev, torvalds@linux-foundation.org, urezki@gmail.com, v-songbaohua@oppo.com, vbabka@suse.cz, virtualization@lists.linux.dev, "Michael S. Tsirkin" , Xuan Zhuo , =?UTF-8?Q?Eugenio_P=C3=A9rez?= , Maxime Coquelin X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7D5DEA0011 X-Stat-Signature: eniudzd85wt56q777oc8kq69pwb4psxn X-Rspam-User: X-HE-Tag: 1722399234-984843 X-HE-Meta: U2FsdGVkX19AVNvl9iVl2N+HcwsorjFmr857jEOwpHSlKffZKhuNqL3q9WUOIe3+q/+MQSY9RZjzrdzldhX9hp9K6CUZ/FRv7ALzA4BxvYMQldYWofRqv61W5tMTT1mHWr1h6uCCdEp/j0cc0g30m2CtCDVVbsUCqH7hrZyrJXeq1vDESlKbslwoC+aqED2Wujo3vX4gDFZE6YSoB/6VZbokr9AvaLaApL8rkq9/CmwCeBbHUqlH7lNM9uZQBE+GVRg6gNT9ESh4j8iY63eU3SGi6dp3bz6E1zD9Dh2FvTZZD5kdbHBPWoIpaFCdN/OYMPql1r1oR8ii6K1rri69bcQtshW83dsmPtpZs7pWW7I8RgglEq7mbUge6EIZo0R4D1qWXRq99QniN4frjW6KRB4xaSibhMqvSWsNbLbWTIXkZtCwrUjfEHhYzRJwcPhM3IiTDgbzQr0zcXzolIe0UvSaH1BcsbehWI9EYFfGzMEMYTxk7ngxZ2Q+pKHbNmt2YQio18MsH0ePCS/7GGawDg7bxUotB66S6iiwjSZOktHSwPa4yp+h9pjnTBRK5n0XjLR5FOB0tao/8PE8wJdZb52XNqj4PY82T/vd/qCldxuY3IY5jiazsSTDRfDt0joKPcFgO/B2tvWtRspUD+J9C62n1d09ZcSNX4yqai32+kAFAgK6IP95glLUpbeK8dDIlTmaUgSqffRpRS5QcNmvIOePyDwnyXLSg95JgvtbTkUkWR7zfLw/ox8dhDWidWzHgxqzgH7n4KxEmDMdsx3kUSR9TcnqDfa0lYPZBJeWFoBCIQ1ZVrCHe4NMMU6EZjswYfB1CaiEAMftvRFJXhReW65xK91LqMdfg4FlO9g7zO6PngsfzLybXY0FAWuy/nFc3oCTNz/QT3/BJUo7qWtNtcfC1Cr0Q2Je9H/erLulfpwsMTfHTRAypha+s3etC6sXIGLDAZiUCYeJMOVeLQH W8z1KmaQ xfkR9WTej39y7QGJIB3plLqbfpnRSHJtShfQJCRNCukQM5FL8tumOualIwb/NJyE+rZow68GBDYny2ZfwNL/iwzys+FvBy/podvklQEzqhUuuiToU1V1us5rJWHFQ3Qe9akLmjRP6OiETwOnd4jIbYuJ1aqTX49fiym/2JNa/3ElzEqLuutXc4KQ3UX8uoCEGaezmmOM5bJ+35Kn+rB/Pn6Xd/HrvluokyvF2VfR/GzlmAZswjgSej/ALQGgFVjFjJaN0jJ4xVbRqnAQjDhSKlM9g/XAGMcgg9M2tnwLMMQKTIYGC2og6S/kxoycnRpUbyEarO0r4fUn0NDeFYVSAAk+w0PUwcCAie40MBK4VnOtpOTUn2IRdO1zWr6iZu75D9iPqM4JOCp+B+DccIEtHWePilE6ET+Ms/8Q5kimcH8cTWIqmrM7tpXMrFzmVYkiIDPlmTNiid4GzfKDf6aFAUay/pCoJlj2NSYWjN8v1CNGjgguDMIuO9xWFdw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 31, 2024 at 12:12=E2=80=AFPM Barry Song <21cnbao@gmail.com> wro= te: > > On Wed, Jul 31, 2024 at 11:58=E2=80=AFAM Jason Wang = wrote: > > > > On Wed, Jul 31, 2024 at 11:15=E2=80=AFAM Barry Song <21cnbao@gmail.com>= wrote: > > > > > > On Wed, Jul 31, 2024 at 11:10=E2=80=AFAM Jason Wang wrote: > > > > > > > > On Wed, Jul 31, 2024 at 8:03=E2=80=AFAM Barry Song <21cnbao@gmail.c= om> wrote: > > > > > > > > > > From: Barry Song > > > > > > > > > > mm doesn't support non-blockable __GFP_NOFAIL allocation. Because > > > > > __GFP_NOFAIL without direct reclamation may just result in a busy > > > > > loop within non-sleepable contexts. > > > > > > > > > > static inline struct page * > > > > > __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > > > > struct alloc_cont= ext *ac) > > > > > { > > > > > ... > > > > > /* > > > > > * Make sure that __GFP_NOFAIL request doesn't leak out a= nd make sure > > > > > * we always retry > > > > > */ > > > > > if (gfp_mask & __GFP_NOFAIL) { > > > > > /* > > > > > * All existing users of the __GFP_NOFAIL are blo= ckable, so warn > > > > > * of any new users that actually require GFP_NOW= AIT > > > > > */ > > > > > if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mas= k)) > > > > > goto fail; > > > > > ... > > > > > } > > > > > ... > > > > > fail: > > > > > warn_alloc(gfp_mask, ac->nodemask, > > > > > "page allocation failure: order:%u", orde= r); > > > > > got_pg: > > > > > return page; > > > > > } > > > > > > > > > > Let's move the memory allocation out of the atomic context and us= e > > > > > the normal sleepable context to get pages. > > > > > > > > > > [RFT]: This has only been compile-tested; I'd prefer if the VDPA = maintainers > > > > > handles it. > > > > > > > > > > Cc: "Michael S. Tsirkin" > > > > > Cc: Jason Wang > > > > > Cc: Xuan Zhuo > > > > > Cc: "Eugenio P=C3=A9rez" > > > > > Cc: Maxime Coquelin > > > > > Signed-off-by: Barry Song > > > > > --- > > > > > drivers/vdpa/vdpa_user/iova_domain.c | 31 ++++++++++++++++++++++= +----- > > > > > drivers/vdpa/vdpa_user/iova_domain.h | 5 ++++- > > > > > drivers/vdpa/vdpa_user/vduse_dev.c | 4 +++- > > > > > 3 files changed, 33 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/= vdpa_user/iova_domain.c > > > > > index 791d38d6284c..9318f059a8b5 100644 > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.c > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.c > > > > > @@ -283,7 +283,23 @@ int vduse_domain_add_user_bounce_pages(struc= t vduse_iova_domain *domain, > > > > > return ret; > > > > > } > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_dom= ain *domain) > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct v= duse_iova_domain *domain) > > > > > +{ > > > > > + struct page **pages; > > > > > + unsigned long count, i; > > > > > + > > > > > + if (!domain->user_bounce_pages) > > > > > + return NULL; > > > > > + > > > > > + count =3D domain->bounce_size >> PAGE_SHIFT; > > > > > + pages =3D kmalloc_array(count, sizeof(*pages), GFP_KERNEL= | __GFP_NOFAIL); > > > > > + for (i =3D 0; i < count; i++) > > > > > + pages[i] =3D alloc_page(GFP_KERNEL | __GFP_NOFAIL= ); > > > > > + > > > > > + return pages; > > > > > +} > > > > > + > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_dom= ain *domain, struct page **pages) > > > > > { > > > > > struct vduse_bounce_map *map; > > > > > unsigned long i, count; > > > > > @@ -294,15 +310,16 @@ void vduse_domain_remove_user_bounce_pages(= struct vduse_iova_domain *domain) > > > > > > > > > > count =3D domain->bounce_size >> PAGE_SHIFT; > > > > > for (i =3D 0; i < count; i++) { > > > > > - struct page *page =3D NULL; > > > > > + struct page *page =3D pages[i]; > > > > > > > > > > map =3D &domain->bounce_maps[i]; > > > > > - if (WARN_ON(!map->bounce_page)) > > > > > + if (WARN_ON(!map->bounce_page)) { > > > > > + put_page(page); > > > > > continue; > > > > > + } > > > > > > > > > > /* Copy user page to kernel page if it's in use *= / > > > > > if (map->orig_phys !=3D INVALID_PHYS_ADDR) { > > > > > - page =3D alloc_page(GFP_ATOMIC | __GFP_NO= FAIL); > > > > > memcpy_from_page(page_address(page), > > > > > map->bounce_page, 0, PAG= E_SIZE); > > > > > } > > > > > @@ -310,6 +327,7 @@ void vduse_domain_remove_user_bounce_pages(st= ruct vduse_iova_domain *domain) > > > > > map->bounce_page =3D page; > > > > > } > > > > > domain->user_bounce_pages =3D false; > > > > > + kfree(pages); > > > > > out: > > > > > write_unlock(&domain->bounce_lock); > > > > > } > > > > > @@ -543,10 +561,13 @@ static int vduse_domain_mmap(struct file *f= ile, struct vm_area_struct *vma) > > > > > static int vduse_domain_release(struct inode *inode, struct file= *file) > > > > > { > > > > > struct vduse_iova_domain *domain =3D file->private_data; > > > > > + struct page **pages; > > > > > + > > > > > + pages =3D vduse_domain_alloc_pages_to_remove_bounce(domai= n); > > > > > > > > > > spin_lock(&domain->iotlb_lock); > > > > > vduse_iotlb_del_range(domain, 0, ULLONG_MAX); > > > > > - vduse_domain_remove_user_bounce_pages(domain); > > > > > + vduse_domain_remove_user_bounce_pages(domain, pages); > > > > > vduse_domain_free_kernel_bounce_pages(domain); > > > > > spin_unlock(&domain->iotlb_lock); > > > > > put_iova_domain(&domain->stream_iovad); > > > > > diff --git a/drivers/vdpa/vdpa_user/iova_domain.h b/drivers/vdpa/= vdpa_user/iova_domain.h > > > > > index f92f22a7267d..17efa5555b3f 100644 > > > > > --- a/drivers/vdpa/vdpa_user/iova_domain.h > > > > > +++ b/drivers/vdpa/vdpa_user/iova_domain.h > > > > > @@ -74,7 +74,10 @@ void vduse_domain_reset_bounce_map(struct vdus= e_iova_domain *domain); > > > > > int vduse_domain_add_user_bounce_pages(struct vduse_iova_domain = *domain, > > > > > struct page **pages, int c= ount); > > > > > > > > > > -void vduse_domain_remove_user_bounce_pages(struct vduse_iova_dom= ain *domain); > > > > > +void vduse_domain_remove_user_bounce_pages(struct vduse_iova_dom= ain *domain, > > > > > + struct page **pages); > > > > > + > > > > > +struct page **vduse_domain_alloc_pages_to_remove_bounce(struct v= duse_iova_domain *domain); > > > > > > > > > > void vduse_domain_destroy(struct vduse_iova_domain *domain); > > > > > > > > > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vd= pa_user/vduse_dev.c > > > > > index 7ae99691efdf..5d8d5810df57 100644 > > > > > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > > > > @@ -1030,6 +1030,7 @@ static int vduse_dev_queue_irq_work(struct = vduse_dev *dev, > > > > > static int vduse_dev_dereg_umem(struct vduse_dev *dev, > > > > > u64 iova, u64 size) > > > > > { > > > > > + struct page **pages; > > > > > int ret; > > > > > > > > > > mutex_lock(&dev->mem_lock); > > > > > @@ -1044,7 +1045,8 @@ static int vduse_dev_dereg_umem(struct vdus= e_dev *dev, > > > > > if (dev->umem->iova !=3D iova || size !=3D dev->domain->b= ounce_size) > > > > > goto unlock; > > > > > > > > > > - vduse_domain_remove_user_bounce_pages(dev->domain); > > > > > + pages =3D vduse_domain_alloc_pages_to_remove_bounce(dev->= domain); > > > > > + vduse_domain_remove_user_bounce_pages(dev->domain, pages)= ; > > > > > unpin_user_pages_dirty_lock(dev->umem->pages, > > > > > dev->umem->npages, true); > > > > > atomic64_sub(dev->umem->npages, &dev->umem->mm->pinned_vm= ); > > > > > > > > We miss a kfree(pages); here? > > > no. > > > i've moved it into vduse_domain_remove_user_bounce_pages. > > > > Ok, but it seems tricky e.g allocated by the caller but freed in > > callee. And I think I missed some important issues in the previous > > review: The check of user_bounce_pages must be done under the > > bounce_lock, otherwise it might race with umem_reg. > > > > So in the case of release(), we know the device is gone, so there's no > > need to allocate pages that will be released soon. So we can pass NULL > > as a hint and just assign bounce_page to NULL in > > vduse_domain_remove_user_bounce_pages(). > > > > And in the case of vduse_dev_dereg_umem(), we need to allocate the > > pages without checking user_bounce_pages. So in > > vduse_domain_remove_user_bounce_pages() if we can free the allocated > > pages as well as the pages in the following check > > > > if (!domain->user_bounce_pages) > > goto out; > > > > What do you think? > > I am not a vdpa guy, but changing the current logic is another patch. > From mm perspective, I can only address the __GFP_NOFAIL issue. > > I actually prefer you guys handle it directly:-) I'd rather report a BUG > instead. TBH, I know nothing about vpda. Fine, let me post a patch for this (no later than the end of this week). Thanks > > > > > Thanks > > > > > > > > > > > > > Thanks > > > > > > > > > -- > > > > > 2.34.1 > > > > > > > > > > Thanks > Barry >