From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, FSL_HELO_FAKE,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05360C35657 for ; Fri, 21 Feb 2020 17:50:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B0E3A206E2 for ; Fri, 21 Feb 2020 17:50:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FzTSLxgk" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B0E3A206E2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E2866B0005; Fri, 21 Feb 2020 12:50:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 594A66B0006; Fri, 21 Feb 2020 12:50:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 45C586B0007; Fri, 21 Feb 2020 12:50:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 2B2886B0005 for ; Fri, 21 Feb 2020 12:50:30 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E63B62C2A for ; Fri, 21 Feb 2020 17:50:29 +0000 (UTC) X-FDA: 76514873778.26.crush88_3e79332126109 X-HE-Tag: crush88_3e79332126109 X-Filterd-Recvd-Size: 8368 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Fri, 21 Feb 2020 17:50:29 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id m13so1076777pjb.2 for ; Fri, 21 Feb 2020 09:50:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=BTYBxRjxL6j9jRa5am8wcAfH9hghQz4giKAxNHy2xK4=; b=FzTSLxgkZT4cxDaHBStgUqU87ldVzo/bNsdhWZMxTdso97FTZfmxRDZzASr4GMAC5U 10aX+WUWPsbGFzeWKUukY2TcbDA+wPz8O0wuCuZrO1yBwIyh50cOW1NIMhtyJHJihGZj TQdELxqazapE2l4nLK4UcgPEs4v86pzAUojGf5KPPZd15eMs9WGRXo6gV3htuLvA++yR R5k+oU7oZmryU7dDL1blPrfAQpj/cGy5ozvZq87NOXC6LRYv/4aI6bT+hjNtPXybnVgt xuA3Hl7h+/kr2VPQ2OXdBB4pcrrft++V0M0dvnsVo5J5SEA7Fu/ZraWTUPcN+BzwtUSu GPWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=BTYBxRjxL6j9jRa5am8wcAfH9hghQz4giKAxNHy2xK4=; b=gi6SPsE3SIS8DIx0eInN8L4L8E3G7DZ1fYgQZwsMv11RKwRt7HkMmYd86MrpKIFukc CPCPfgj20RIJHEVuo73mQCt5yVDjry7H9GrsJrolGT0FeIYh5gYcy7sVOjiN/TKVcn5r ljUs/FF8MM3uPGFtNcsce23YvgWP3TJP8LjgiUW+DIgsh5OnwcQpQ9iPv7487CvUNkcs 0wa1xJq/oO4wwkAyYoi8XnyP6sEDjd2NK4kV79FMZHTNan3bTmGet5rqpwi7W53R2Xbz TvesrKDDR7z6uYrEELV39FwKTHklCWTM9Ur3RuBYdWlaHS0DkXXHEr5MWKFlJimGd+GS 1RDQ== X-Gm-Message-State: APjAAAXxOC6iiH8ZUMErpRMGC55jS5OfWHEbQueJINrMrlOBejUiO8Fh +5KazoVXlgixDMWgU3pdZ2c= X-Google-Smtp-Source: APXvYqxFGA8qOE0zV8w0G6DGlTadJbX7gGbhCdt+y+ZbsNrPZ8eNdi/7bpZfOZvHh0hccPyg8He0cg== X-Received: by 2002:a17:90a:e509:: with SMTP id t9mr4217996pjy.110.1582307427862; Fri, 21 Feb 2020 09:50:27 -0800 (PST) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id c26sm3591866pfj.8.2020.02.21.09.50.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Feb 2020 09:50:26 -0800 (PST) Date: Fri, 21 Feb 2020 09:50:24 -0800 From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner Subject: Re: [PATCH v2 2/2] mm: fix long time stall from mm_populate Message-ID: <20200221175024.GC226145@google.com> References: <20200214192951.29430-1-minchan@kernel.org> <20200214192951.29430-2-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200214192951.29430-2-minchan@kernel.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Bumping up. On Fri, Feb 14, 2020 at 11:29:51AM -0800, Minchan Kim wrote: > Basically, fault handler releases mmap_sem before requesting readahead > and then it is supposed to retry lookup the page from page cache with > FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. > > However, what happens if the fault handler find a page from page > cache and the page has readahead marker but are waiting under > writeback? Plus one more condition, it happens under mm_populate > which repeats faulting unless it encounters error. So let's assemble > conditions below. > > CPU 1 CPU 2 > > - first loop > mm_populate > for () > .. > ret = populate_vma_page_range > __get_user_pages > faultin_page > handle_mm_fault > filemap_fault > do_async_mmap_readahead > if (PageReadahead(pageA)) > maybe_unlock_mmap_for_io > up_read(mmap_sem) > shrink_page_list > pageout > SetPageReclaim(=SetPageReadahead)(pageA) > writepage > SetPageWriteback(pageA) > > page_cache_async_readahead() > ClearPageReadahead(pageA) > do_async_mmap_readahead > lock_page_maybe_drop_mmap > goto out_retry > > the pageA is reclaimed > and new pageB is populated to the file offset > and finally has become PG_readahead > > - second loop > > __get_user_pages > faultin_page > handle_mm_fault > filemap_fault > do_async_mmap_readahead > if (PageReadahead(pageB)) > maybe_unlock_mmap_for_io > up_read(mmap_sem) > shrink_page_list > pageout > SetPageReclaim(=SetPageReadahead)(pageB) > writepage > SetPageWriteback(pageB) > > page_cache_async_readahead() > ClearPageReadahead(pageB) > do_async_mmap_readahead > lock_page_maybe_drop_mmap > goto out_retry > > It could be repeated forever so it's livelock. without involving reclaim, > it could happens if ra_pages become zero by fadvise/other threads who > have same fd one doing randome while the other one is sequential > because page_cache_async_readahead has following condition check like > PageWriteback and ra_pages are never synchrnized with fadvise and > shrink_readahead_size_eio from other threads. > > void page_cache_async_readahead(struct address_space *mapping, > unsigned long req_size) > { > /* no read-ahead */ > if (!ra->ra_pages) > return; > > Thus, we need to limit fault retry from mm_populate like page > fault handler. > > Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operations") > Reviewed-by: Jan Kara > Signed-off-by: Minchan Kim > --- > mm/gup.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 1b521e0ac1de..6f6548c63ad5 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked(struct task_struct *tsk, > * > * This takes care of mlocking the pages too if VM_LOCKED is set. > * > - * return 0 on success, negative error code on error. > + * return number of pages pinned on success, negative error code on error. > * > * vma->vm_mm->mmap_sem must be held. > * > @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) > struct vm_area_struct *vma = NULL; > int locked = 0; > long ret = 0; > + bool tried = false; > > end = start + len; > > @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors) > * double checks the vma flags, so that it won't mlock pages > * if the vma was already munlocked. > */ > - ret = populate_vma_page_range(vma, nstart, nend, &locked); > + ret = populate_vma_page_range(vma, nstart, nend, > + tried ? NULL : &locked); > if (ret < 0) { > if (ignore_errors) { > ret = 0; > continue; /* continue at next VMA */ > } > break; > - } > + } else if (ret == 0) > + tried = true; > + else > + tried = false; > nend = nstart + ret * PAGE_SIZE; > ret = 0; > } > -- > 2.25.0.265.gbab2e86ba0-goog >