From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EF67C2BD09 for ; Thu, 27 Jun 2024 18:55:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B20B56B008A; Thu, 27 Jun 2024 14:55:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACEAF6B0096; Thu, 27 Jun 2024 14:55:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 996706B0098; Thu, 27 Jun 2024 14:55:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7C8B36B008A for ; Thu, 27 Jun 2024 14:55:36 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 02529C0E2B for ; Thu, 27 Jun 2024 18:55:35 +0000 (UTC) X-FDA: 82277572272.10.C87881A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf29.hostedemail.com (Postfix) with ESMTP id C883612001A for ; Thu, 27 Jun 2024 18:55:33 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BKL+8m5L; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719514525; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tRo2z4A0curEj7fGmncj+Ilw/r1TDvkXres+ZonF1MQ=; b=E6Xzc5rqPEYGN40cTM9CZnFTEeEiwfom5ZSUCdzh8FMkvfUmL6BeoGuQ65EdeDdkh0wgwm V0wrTriRIUFkstYbLt5oBKf8EKbFASRz+DamxM/YQJ1u/gjYksLTrNfrR7Do+urXZDoIuC Xw9gh+px3UNLBjbuX5w95rEMrhDyoFU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BKL+8m5L; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719514525; a=rsa-sha256; cv=none; b=6aaTztHwuVZaLREN7g8gRXG3GrFTToipJ32wvE/VNvvYh50+wLMeMVqfy/bUpbMY+tsSEO Q/RoKxJrYbjvOJ1bG0gGUhBPVVqEdGbTj6+s3TeZs77wfc/Jjb8OLA/EPKrU+tyrzRmKjR lYVJtYJCR8qp4XDJOYLDbnXH8XodahE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719514533; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tRo2z4A0curEj7fGmncj+Ilw/r1TDvkXres+ZonF1MQ=; b=BKL+8m5LEgfMwX0Dl8xQwl7B5YYVooVXZ+NeKPKfBAST1Is4lI/UiKxJORaHaF1NArWRTZ /IGcDCUG+iUez3jGWFdFQJiYfcTDIPwZ9Jn+5+fKb2+ucSIbm6y0RED03CzfoHlI0dNtfr tr3T1gTjARG50eOQlN0PZJpVrFb6KR4= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-621-CSVLwNBaPmmgj2hMNSMtkA-1; Thu, 27 Jun 2024 14:55:31 -0400 X-MC-Unique: CSVLwNBaPmmgj2hMNSMtkA-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6b50433ada9so24571456d6.3 for ; Thu, 27 Jun 2024 11:55:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719514531; x=1720119331; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tRo2z4A0curEj7fGmncj+Ilw/r1TDvkXres+ZonF1MQ=; b=kkeuUrSO8GA8HOA/iCXdfUSu/oJmX4A8qLGbC11eJlsf6TkxXXTsOwkOiOsMPYqgKM vHlS8n5IFEk0IP6S+RTj5BAFpPXMZMYhuf6NxNX5BS5ItIAGPHhr4SxF77jUpzBq3HpE NsG59sXAyzvPszkswDPM7oLk6/exYDlCyVtfYBuRyNLXdYehF4G/4xkjoFXLDqD+7ILY TabO46tT/rm27/MDJzR6LGjuRsI7QxZGWeGW/wrxeTi09g6fjSHvpgTSx5OMWa2prOh+ y6Lq4azz3BaBO4FTRbIjPA79ZLe63KaPuJIbKbjPe7WIKnB+piBoWMaiXznNCFxhoyCP ky2A== X-Forwarded-Encrypted: i=1; AJvYcCV7X1G4bI8Lx8stG9TqYEfeoVVnvw2s36n8vcRAvhS1Tq3/cCHJjFiodxWGI3wfFaYxaE2kudZ/Gte3PArySCFCTHg= X-Gm-Message-State: AOJu0Yyr1LwzRqSnUSsTFvpdtdv4+YepbQLQFosycaYtTm2OAtF02JQ7 iPGOQ3xKzo28Jq/YePAhlKDdmO056YWLCbqr80iREPqcN20cyvB7/yU651kH2GwhJuBrbky41MF i2W9k5gWppGE/0/AnwdpTEFw7i2c3BTj9vxD9685LBlDSXhBU X-Received: by 2002:a05:620a:171e:b0:798:d5b6:ccc4 with SMTP id af79cd13be357-79bdd895786mr1670637185a.6.1719514530924; Thu, 27 Jun 2024 11:55:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHjq38zT/pvHIS6aZ5W1jf59a1bV8R30a02zd2hf1puy3/wwJsfC/keJrwwDT4rPC0cF1/wfw== X-Received: by 2002:a05:620a:171e:b0:798:d5b6:ccc4 with SMTP id af79cd13be357-79bdd895786mr1670633985a.6.1719514530206; Thu, 27 Jun 2024 11:55:30 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-79d692f0b82sm4706585a.74.2024.06.27.11.55.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jun 2024 11:55:29 -0700 (PDT) Date: Thu, 27 Jun 2024 14:55:27 -0400 From: Peter Xu To: yangge1116@126.com Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn, David Hildenbrand , Yang Shi Subject: Re: [PATCH] mm/gup: Use try_grab_page() instead of try_grab_folio() in gup slow path Message-ID: References: <1719478388-31917-1-git-send-email-yangge1116@126.com> MIME-Version: 1.0 In-Reply-To: <1719478388-31917-1-git-send-email-yangge1116@126.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: C883612001A X-Stat-Signature: qeqjjezsj1znfoamaxbi396yw5hm39yh X-HE-Tag: 1719514533-335268 X-HE-Meta: U2FsdGVkX19E/FOsnvU7BJJnfHXKoP78eIzArAC5qoRZDc5NfJkztDsXQq32tyxDeZ1hbRzUNPbdGyxPDxMn58mcx8YTUV9SpbMOI7Nb1d//k33zrIuEbHpWvT0RWDW8vsCmFf3wd8+1jwpsQyotqo9xod1swy9tRNyqR0oO3R1mLGjP/BzrZjfeQpOTa5u1k98qaU8KSuapPU+Us38+U8+gGaAZYS/JrDBk7F8gSbVrhUaJsGTE5Li1Zuolp8sH999ezj2XhscGnheFHwbzgXqTUZXsNos0cW/fRYmFNlRjaShtjoEjjQXratixrZXhpI5JqZsY1yQVlnAkPy5VrscBqcZ0iWhT8rQUV+P9Ddddd51ivibSxhfz6T4C8jNO4umt/IY7jTARmllkRhw6Y8I+aIKO7Zw2WeDbC281kPqNY0YOAqV5RH61TFqB7wHL1bfgGDr6/u4gfgZmXEJ5KK+CayZrrdInjiXu3SRk1dxuCYUvhMH3mdUn10Pd0jbOmN2yRhTTyxT79gWNP89qadZqAdm2k3ExBokt+LNem9BbGrSqShAZhi9gEWNz1AR+pur8P8HdqFmpeoEn/P4C7HF0xjJTq8E1HczCseQ9FLLwMgoPKyW87iPbSNfSLH6mxB7TFwYabr+/sHUpsjOCiY8Xt0FPLpyqoRqJ/M+qrLQVQYP0qL8H9fpjBd5t9D0YbENPc686qhemx7F9Q6HWebriMaBWwnHG6VSTuzdMGViI86wlOfXRfIQaYTicmZl75p7ETxRQisrDrRv9R2zXd9IrqDGj9NHdRzjU+vHquMgJPzjKNkE3A/3S8k/swz7p3DSqc5AiXoL2mALxE4gFs2b5GV7tqgaRiOYKZNMYdYWy+tt4o5w6oqoxSaUBqkTNz01756rXTqZx9qSbOxGQ9FaxTzEdG9ib3qTiHX4iYBisPf8F4e2RcYf4mvNaFOvvMUO9m+IzvWIQMNiSUUX 3zokfusk chsZKIoUR5kAi7bx34FD5Rzyiyu3oY2cBOA4MabpD+ID4MLabYr9754IObbZcn2S864pGyZmLDHLJPkQ7UTWovU7OBg5SO6Hs16P3TU1TDiru4UxD2DJbsBAE1kacELYaHAfpxMYh1Jwdoq0MS+fPvvICCd66nSsv+GK0LaPLyzpImTk8GHwvHKzh9JhZS+L+VojC1jbPTRUJQy+nMdKnNHupxnIPJENQvpsKKtyC361PFRAeoTSMEX0F1JavIdYN92NsZ/yl+BktQGSkj2K3+i720s2lBNrQZTUyN3ZKjhK4L2BfNoUaTUxC3cY/VuolFI1CT85OunScWAc26hmfMCCj0AMjCosBWbbBtb3jbVeH+e0HFbd1gk72rA2lB227OIEuLjMb+GIzPhfudB/Xk7Ds+vmzPMU7T/Jsmcim09G27EF10WpuecohRFNzFgxN5/HTap/9tzx8X+e2Yar1e2iV9JFmZbgJFO0XC7BukjGeZ1K48PnpM2hnHUhWwd7qvEtDXxXUpLqGJagR702pLelBrA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 27, 2024 at 04:53:08PM +0800, yangge1116@126.com wrote: > From: yangge > > If a large number of CMA memory are configured in system (for > example, the CMA memory accounts for 50% of the system memory), > starting a SEV virtual machine will fail. During starting the SEV > virtual machine, it will call pin_user_pages_fast(..., FOLL_LONGTERM, > ...) to pin memory. Normally if a page is present and in CMA area, > pin_user_pages_fast() will first call __get_user_pages_locked() to > pin the page in CMA area, and then call > check_and_migrate_movable_pages() to migrate the page from CMA area > to non-CMA area. But the current code calling __get_user_pages_locked() > will fail, because it call try_grab_folio() to pin page in gup slow > path. > > The commit 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages > != NULL"") uses try_grab_folio() in gup slow path, which seems to be > problematic because try_grap_folio() will check if the page can be > longterm pinned. This check may fail and cause __get_user_pages_lock() > to fail. However, these checks are not required in gup slow path, > seems we can use try_grab_page() instead of try_grab_folio(). In > addition, in the current code, try_grab_page() can only add 1 to the > page's refcount. We extend this function so that the page's refcount > can be increased according to the parameters passed in. > > The following log reveals it: > > [ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520 > [ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6 > [ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520 > [ 464.325515] Call Trace: > [ 464.325520] > [ 464.325523] ? __get_user_pages+0x423/0x520 > [ 464.325528] ? __warn+0x81/0x130 > [ 464.325536] ? __get_user_pages+0x423/0x520 > [ 464.325541] ? report_bug+0x171/0x1a0 > [ 464.325549] ? handle_bug+0x3c/0x70 > [ 464.325554] ? exc_invalid_op+0x17/0x70 > [ 464.325558] ? asm_exc_invalid_op+0x1a/0x20 > [ 464.325567] ? __get_user_pages+0x423/0x520 > [ 464.325575] __gup_longterm_locked+0x212/0x7a0 > [ 464.325583] internal_get_user_pages_fast+0xfb/0x190 > [ 464.325590] pin_user_pages_fast+0x47/0x60 > [ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd] > [ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd] > > Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") > Cc: > Signed-off-by: yangge Thanks for the report and the fix proposed. This is unfortunate.. It's just that I worry this may not be enough, as thp slow gup isn't the only one using try_grab_folio(). There're also hugepd and memfd pinning (which just got queued, again). I suspect both of them can also hit a cma chunk here, and fail whenever they shouldn't have. The slight complexity resides in the hugepd path where it right now shares with fast-gup. So we may potentially need something similiar to what Yang used to introduce in this patch: https://lore.kernel.org/r/20240604234858.948986-2-yang@os.amperecomputing.com So as to identify whether the hugepd gup is slow or fast, and we should only let the fast gup fail on those. Let me also loop them in on the other relevant discussion. Thanks, -- Peter Xu