* [PATCH AUTOSEL 4.19 94/97] mm/page-writeback.c: don't break integrity writeback on ->writepage() error
[not found] <20190108192949.122407-1-sashal@kernel.org>
@ 2019-01-08 19:29 ` Sasha Levin
2019-01-08 19:29 ` [PATCH AUTOSEL 4.19 95/97] mm/swap: use nr_node_ids for avail_lists in swap_info_struct Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-01-08 19:29 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Brian Foster, Andrew Morton, Linus Torvalds, Sasha Levin, linux-mm
From: Brian Foster <bfoster@redhat.com>
[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ]
write_cache_pages() is used in both background and integrity writeback
scenarios by various filesystems. Background writeback is mostly
concerned with cleaning a certain number of dirty pages based on various
mm heuristics. It may not write the full set of dirty pages or wait for
I/O to complete. Integrity writeback is responsible for persisting a set
of dirty pages before the writeback job completes. For example, an
fsync() call must perform integrity writeback to ensure data is on disk
before the call returns.
write_cache_pages() unconditionally breaks out of its processing loop in
the event of a ->writepage() error. This is fine for background
writeback, which had no strict requirements and will eventually come
around again. This can cause problems for integrity writeback on
filesystems that might need to clean up state associated with failed page
writeouts. For example, XFS performs internal delayed allocation
accounting before returning a ->writepage() error, where applicable. If
the current writeback happens to be associated with an unmount and
write_cache_pages() completes the writeback prematurely due to error, the
filesystem is unmounted in an inconsistent state if dirty+delalloc pages
still exist.
To handle this problem, update write_cache_pages() to always process the
full set of pages for integrity writeback regardless of ->writepage()
errors. Save the first encountered error and return it to the caller once
complete. This facilitates XFS (or any other fs that expects integrity
writeback to process the entire set of dirty pages) to clean up its
internal state completely in the event of persistent mapping errors.
Background writeback continues to exit on the first error encountered.
[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/page-writeback.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 84ae9bf5858a..ea4fd3af3b4b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2156,6 +2156,7 @@ int write_cache_pages(struct address_space *mapping,
{
int ret = 0;
int done = 0;
+ int error;
struct pagevec pvec;
int nr_pages;
pgoff_t uninitialized_var(writeback_index);
@@ -2236,25 +2237,31 @@ int write_cache_pages(struct address_space *mapping,
goto continue_unlock;
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
- ret = (*writepage)(page, wbc, data);
- if (unlikely(ret)) {
- if (ret == AOP_WRITEPAGE_ACTIVATE) {
+ error = (*writepage)(page, wbc, data);
+ if (unlikely(error)) {
+ /*
+ * Handle errors according to the type of
+ * writeback. There's no need to continue for
+ * background writeback. Just push done_index
+ * past this page so media errors won't choke
+ * writeout for the entire file. For integrity
+ * writeback, we must process the entire dirty
+ * set regardless of errors because the fs may
+ * still have state to clear for each page. In
+ * that case we continue processing and return
+ * the first error.
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
unlock_page(page);
- ret = 0;
- } else {
- /*
- * done_index is set past this page,
- * so media errors will not choke
- * background writeout for the entire
- * file. This has consequences for
- * range_cyclic semantics (ie. it may
- * not be suitable for data integrity
- * writeout).
- */
+ error = 0;
+ } else if (wbc->sync_mode != WB_SYNC_ALL) {
+ ret = error;
done_index = page->index + 1;
done = 1;
break;
}
+ if (!ret)
+ ret = error;
}
/*
--
2.19.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH AUTOSEL 4.19 95/97] mm/swap: use nr_node_ids for avail_lists in swap_info_struct
[not found] <20190108192949.122407-1-sashal@kernel.org>
2019-01-08 19:29 ` [PATCH AUTOSEL 4.19 94/97] mm/page-writeback.c: don't break integrity writeback on ->writepage() error Sasha Levin
@ 2019-01-08 19:29 ` Sasha Levin
1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2019-01-08 19:29 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Aaron Lu, Vasily Averin, Huang Ying, Andrew Morton,
Linus Torvalds, Sasha Levin, linux-mm
From: Aaron Lu <aaron.lu@intel.com>
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]
Since a2468cc9bfdf ("swap: choose swap device according to numa node"),
avail_lists field of swap_info_struct is changed to an array with
MAX_NUMNODES elements. This made swap_info_struct size increased to 40KiB
and needs an order-4 page to hold it.
This is not optimal in that:
1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
is a waste of memory;
2 It could cause swapon failure if the swap device is swapped on
after system has been running for a while, due to no order-4
page is available as pointed out by Vasily Averin.
Solve the above two issues by using nr_node_ids(which is the actual
possible node number the running system has) for avail_lists instead of
MAX_NUMNODES.
nr_node_ids is unknown at compile time so can't be directly used when
declaring this array. What I did here is to declare avail_lists as zero
element array and allocate space for it when allocating space for
swap_info_struct. The reason why keep using array but not pointer is
plist_for_each_entry needs the field to be part of the struct, so pointer
will not work.
This patch is on top of Vasily Averin's fix commit. I think the use of
kvzalloc for swap_info_struct is still needed in case nr_node_ids is
really big on some systems.
Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/swap.h | 11 ++++++++++-
mm/swapfile.c | 3 ++-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8e2c11e692ba..77221c16733a 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -232,7 +232,6 @@ struct swap_info_struct {
unsigned long flags; /* SWP_USED etc: see above */
signed short prio; /* swap priority of this type */
struct plist_node list; /* entry in swap_active_head */
- struct plist_node avail_lists[MAX_NUMNODES];/* entry in swap_avail_heads */
signed char type; /* strange name for an index */
unsigned int max; /* extent of the swap_map */
unsigned char *swap_map; /* vmalloc'ed array of usage counts */
@@ -273,6 +272,16 @@ struct swap_info_struct {
*/
struct work_struct discard_work; /* discard worker */
struct swap_cluster_list discard_clusters; /* discard clusters list */
+ struct plist_node avail_lists[0]; /*
+ * entries in swap_avail_heads, one
+ * entry per node.
+ * Must be last as the number of the
+ * array is nr_node_ids, which is not
+ * a fixed value so have to allocate
+ * dynamically.
+ * And it has to be an array so that
+ * plist_for_each_* can work.
+ */
};
#ifdef CONFIG_64BIT
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8810a6d7d67f..08ba73eaaba2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2819,8 +2819,9 @@ static struct swap_info_struct *alloc_swap_info(void)
struct swap_info_struct *p;
unsigned int type;
int i;
+ int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);
- p = kvzalloc(sizeof(*p), GFP_KERNEL);
+ p = kvzalloc(size, GFP_KERNEL);
if (!p)
return ERR_PTR(-ENOMEM);
--
2.19.1
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-01-08 19:32 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20190108192949.122407-1-sashal@kernel.org>
2019-01-08 19:29 ` [PATCH AUTOSEL 4.19 94/97] mm/page-writeback.c: don't break integrity writeback on ->writepage() error Sasha Levin
2019-01-08 19:29 ` [PATCH AUTOSEL 4.19 95/97] mm/swap: use nr_node_ids for avail_lists in swap_info_struct Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox