[GIT PULL] y2038 changes for vfs

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[GIT PULL] y2038 changes for vfs

Arnd Bergmann
The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:

  Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git tags/y2038-4.7

for you to fetch changes up to 4b277763c5b3ce6d60168797e6f38260416d9b13:

  vfs: Add support to document max and min inode times (2016-04-26 18:12:30 +0200)

----------------------------------------------------------------
y2038 changes for vfs

This is a preparation series for changing the VFS infrastructure to use
time64_t in inode timestamps, introducing a couple of simple conversion
helper functions that can be used to make the individual file systems
independent of the type of inode->i_{a,c,m}time and
iattr->ia_{a,c,m}time, following the plan described at
http://kernelnewbies.org/y2038/vfs.

The patches were done by Deepa Dinamani during her Outreachy internship
along with the following set of more than 100 patches that depend on
them but that can get merged through the individual file system
maintainer trees.

Thomas Gleixner reviewed the patches and the approach before, but after
http://www.spinics.net/lists/y2038/msg01514.html, both he and Al Viro
were busy in other areas of the kernel and neither of them picked them
up. I'm sending the series for inclusion directly so we can make
progress on the follow-up series and hopefully convert struct inode
to timespec64 in 4.8 or 4.9.

One small cleanup patch from Tina Ruchandani is included in the branch
as well, this is also related to the y2038 effort and is the last one
of a backlog of older patches that I picked up last year and otherwise
got merged through maintainer trees.

----------------------------------------------------------------
Deepa Dinamani (3):
      fs: Add current_fs_time_sec() function
      vfs: Add vfs_time accessors
      vfs: Add support to document max and min inode times

Tina Ruchandani (1):
      AFS: Correctly use 64-bit time for UUID

 fs/afs/main.c      |  6 ++----
 include/linux/fs.h | 29 ++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/fs/afs/main.c b/fs/afs/main.c
index 35de0c04729f..129ff432391c 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -14,6 +14,7 @@
 #include <linux/init.h>
 #include <linux/completion.h>
 #include <linux/sched.h>
+#include <linux/ktime.h>
 #include "internal.h"
 
 MODULE_DESCRIPTION("AFS Client File System");
@@ -37,7 +38,6 @@ struct workqueue_struct *afs_wq;
  */
 static int __init afs_get_client_UUID(void)
 {
- struct timespec ts;
  u64 uuidtime;
  u16 clockseq;
  int ret;
@@ -48,9 +48,7 @@ static int __init afs_get_client_UUID(void)
  if (ret < 0)
  return ret;
 
- getnstimeofday(&ts);
- uuidtime = (u64) ts.tv_sec * 1000 * 1000 * 10;
- uuidtime += ts.tv_nsec / 100;
+ uuidtime = ktime_divns(ktime_get_real(), 100);
  uuidtime += AFS_UUID_TO_UNIX_TIME;
  afs_uuid.time_low = uuidtime;
  afs_uuid.time_mid = uuidtime >> 32;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 70e61b58baaf..d9573060591e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1362,7 +1362,14 @@ struct super_block {
 
  /* Granularity of c/m/atime in ns.
    Cannot be worse than a second */
- u32   s_time_gran;
+ u32 s_time_gran;
+
+ /*
+ * Max and min values for timestamps
+ * according to the range supported by filesystems.
+ */
+ time64_t s_time_min;
+ time64_t s_time_max;
 
  /*
  * The next field is for VFS *only*. No filesystems have any business
@@ -1423,6 +1430,26 @@ struct super_block {
 
 extern struct timespec current_fs_time(struct super_block *sb);
 
+static inline struct timespec current_fs_time_sec(struct super_block *sb)
+{
+ return (struct timespec) { get_seconds(), 0 };
+}
+
+/* Place holder defines to ensure safe transition to timespec64
+ * in the vfs layer.
+ * These can be deleted after all filesystems and vfs are switched
+ * over to using 64 bit time.
+ */
+static inline struct timespec vfs_time_to_timespec(struct timespec inode_ts)
+{
+ return inode_ts;
+}
+
+static inline struct timespec timespec_to_vfs_time(struct timespec ts)
+{
+ return ts;
+}
+
 /*
  * Snapshotting support.
  */

Reply | Threaded
Open this post in threaded view
|

Re: [GIT PULL] y2038 changes for vfs

Linus Torvalds-2
On Tue, May 24, 2016 at 1:11 PM, Arnd Bergmann <[hidden email]> wrote:
> The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:
>
>   Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git tags/y2038-4.7

The more I look at this, the less I like it.

There doesn't even seem to be any *point* to the preparatory patches.
I'm not seeing what any of those patches actually help prepare. The
two new superblock fields that it adds, for example, should likely
never be touched directly by any code in the first place, so adding
them only encourages people to add more "preparatory" patches to
filesystems that simply don't seem sensible. It's not clear we want a
seconds-based interface there, when in many ways ktime_t is much
*much* preferable for internal kernel representations for the next
hundred years or so.

For example, preparing to replace CURRENT_TIME_SEC with
current_fs_time_sec() is going to be a huge big patch replacing every
single user *anyway* due to the addition of the superblock parameter.
And since we'd have to change the type in the inode, that will be a
flag-day anyway.

So I'm not seeing real advantages to the prep-work. What does it
actually *help*? I'd have seen more point to it if it had actually
converted all the existing CURRENT_TIME_SEC cases, and basically said
"the code is now syntactically ready to start using per-sb limits".

I don't much see the point of a preparatory patch that just paves the
way for a hundred other small pointless one-liner patches, when it
shouldn't be a problem to just do it in one go.

Just as an example: code that does

        dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;

could pretty mechanically be converted to

        dir->i_mtime = dir->i_ctime = current_fs_time(sb);

and there really is only about a hundred of those. THAT would be a
preparatory patch that actually adds value.

IOW, do it as one single patch that gets rid of a bad interface, not
as "one pointless preparatory patch that than makes it possible to
make a hundred other pointless patches to use it".

It's not like it's hard to compile-test the pretty mechanical
conversion. There are no architecture-specific users, so I suspect
that a trivial "make allmodconfig" build will catch all the cases.

Why drag something like this out, in other words?

                  Linus
Reply | Threaded
Open this post in threaded view
|

Re: [GIT PULL] y2038 changes for vfs

Linus Torvalds-2
On Tue, May 24, 2016 at 3:23 PM, Linus Torvalds
<[hidden email]> wrote:
>
> Just as an example: code that does
>
>         dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
>
> could pretty mechanically be converted to
>
>         dir->i_mtime = dir->i_ctime = current_fs_time(sb);

Actually, looking at the users, most of them don't have the superblock
directly as a variable, so it might be better to just make
current_fs_time() take the inode pointer instead.

That would make the conversion simpler, and it can then do the
inode->i_sb thing when it is converted to actually take the filesystem
limits and time granularity into account.

I suspect you could do 95% with a fairly simply coccinelle script. Or
even just use 'sed', with something like

  sed 's/\([a-z]*\)->i_\([amc]\)time = CURRENT_TIME_SEC/\1->i_\2time =
current_fs_time(\1)/'

seems to get close.

Run that over the tree, fix up the few cases it doesn't catch, remove
the broken CURRENT_TIME_SEC thing, and voilá, you're pretty much done.

No?

                  Linus
Reply | Threaded
Open this post in threaded view
|

Re: [GIT PULL] y2038 changes for vfs

Deepa Dinamani
On Tue, May 24, 2016 at 3:44 PM, Linus Torvalds
<[hidden email]> wrote:

> On Tue, May 24, 2016 at 3:23 PM, Linus Torvalds
> <[hidden email]> wrote:
>>
>> Just as an example: code that does
>>
>>         dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
>>
>> could pretty mechanically be converted to
>>
>>         dir->i_mtime = dir->i_ctime = current_fs_time(sb);

If we use current_fs_time() instead of CURRENT_TIME_SEC, then it also
means that all the filesystems now using these should use correct
granularity or provide update_time callbacks for time. This is not
true today. Eg: fat uses CURRENT_TIME_SEC everywhere but uses
generic_update_time() and so the granularity is in nanoseconds for in
memory timestamps even though this was not intended.

> Actually, looking at the users, most of them don't have the superblock
> directly as a variable, so it might be better to just make
> current_fs_time() take the inode pointer instead.

This is a much bigger change as current_fs_time() is used in many
places for timestamps where inode is not part of the functions.
And, this would mean reordering code. Eg:  cifs_create_dfs_fattr(),
btrfs_update_root_times()
And, we are really only relying on super_block attributes in the function.

> That would make the conversion simpler, and it can then do the
> inode->i_sb thing when it is converted to actually take the filesystem
> limits and time granularity into account.
>
> I suspect you could do 95% with a fairly simply coccinelle script. Or
> even just use 'sed', with something like
>
>   sed 's/\([a-z]*\)->i_\([amc]\)time = CURRENT_TIME_SEC/\1->i_\2time =
> current_fs_time(\1)/'
>
> seems to get close.
>
> Run that over the tree, fix up the few cases it doesn't catch, remove
> the broken CURRENT_TIME_SEC thing, and voilá, you're pretty much done.
>
> No?

Yes, replacing CURRENT_TIME_SEC with current_fs_time_sec() is pretty
straight forward.
The reason this was split into many patches was because we thought it
would be easier for individual maintainers to review the code.
This is particularly true for filesystem ranges.
For instance, CIFS has multiple ranges based on the version used: cifs
modern or cifs smb.

But, it should be fine as a single patch as well.

-Deepa
Reply | Threaded
Open this post in threaded view
|

Re: [GIT PULL] y2038 changes for vfs

Arnd Bergmann
In reply to this post by Linus Torvalds-2
On Tuesday, May 24, 2016 3:23:39 PM CEST Linus Torvalds wrote:

> On Tue, May 24, 2016 at 1:11 PM, Arnd Bergmann <[hidden email]> wrote:
> > The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:
> >
> >   Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git tags/y2038-4.7
>
> The more I look at this, the less I like it.
>
> There doesn't even seem to be any *point* to the preparatory patches.
> I'm not seeing what any of those patches actually help prepare. The
> two new superblock fields that it adds, for example, should likely
> never be touched directly by any code in the first place, so adding
> them only encourages people to add more "preparatory" patches to
> filesystems that simply don't seem sensible.

The minimum and maximum times in the superblock are needed to
implement a proper policy for dealing with limited file systems:

Today, using 'utimes' on a file to set a date in the distant
future or past usually results in a silent overflow with a more
or less random time. We want to replace that with a sane behavior
and either fail with an error for out of range times or truncate
the the earliest or latest times that are supported by the
respective file system. Whatever we decide to do there however
requires knowing those times, and storing them in the superblock
seems much easier than adding code to each file system for checking
them.

The other point is being able to refuse writable mounts to file
systems that are close to overflowing. If someone wants to ship
a system that has longterm support beyond 2038, then any file system
that cannot store later time stamps must already not be mountable
so we can prevent it from behaving inconsistently later.

Again, the specific policy and how it's controlled (mount option,
compile-time, or sysctl to refuse the mount, or simply printing
a warning) is to be decided later, but we have to know the limits
at mount time in order to do it.

> It's not clear we want a
> seconds-based interface there, when in many ways ktime_t is much
> *much* preferable for internal kernel representations for the next
> hundred years or so.
>
> For example, preparing to replace CURRENT_TIME_SEC with
> current_fs_time_sec() is going to be a huge big patch replacing every
> single user *anyway* due to the addition of the superblock parameter.

Adding current_fs_time_sec() instead of just using current_fs_time()
is just to preserve the existing performance behavior. Reading
the current seconds value is a bit faster than reading
seconds+nanoseconds (it doesn't require computing the exact
nanosecond value), and much faster than computing the seconds
from a ktime_t. Note that most file systems don't do sub-second
timestamps at all.

> And since we'd have to change the type in the inode, that will be a
> flag-day anyway.

It's not really a flag day, after the initial per-filesystem
patches, a fairly manageable patch changes the VFS data structures
and the few things that cannot be done separately before:

https://github.com/deepa-hub/vfs/commit/09323fb679c27694475d9c12992782dcba69c493

> So I'm not seeing real advantages to the prep-work. What does it
> actually *help*? I'd have seen more point to it if it had actually
> converted all the existing CURRENT_TIME_SEC cases, and basically said
> "the code is now syntactically ready to start using per-sb limits".
>
> I don't much see the point of a preparatory patch that just paves the
> way for a hundred other small pointless one-liner patches, when it
> shouldn't be a problem to just do it in one go.
>
> Just as an example: code that does
>
>         dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
>
> could pretty mechanically be converted to
>
>         dir->i_mtime = dir->i_ctime = current_fs_time(sb);
>
> and there really is only about a hundred of those. THAT would be a
> preparatory patch that actually adds value.
>
> IOW, do it as one single patch that gets rid of a bad interface, not
> as "one pointless preparatory patch that than makes it possible to
> make a hundred other pointless patches to use it".
>
> It's not like it's hard to compile-test the pretty mechanical
> conversion. There are no architecture-specific users, so I suspect
> that a trivial "make allmodconfig" build will catch all the cases.
>
> Why drag something like this out, in other words?

The vfs_time_to_timespec/timespec_to_vfs_time accessors and the
s_time_min/s_time_max patch are really the ones that make most
sense doing per file system. These are still all really simple
patches, but it seemed logical to keep all three together and then
go through each file system one by one. The hard part here is
really catching the attention of the file system maintainers,
not doing the patches.

        Arnd
Reply | Threaded
Open this post in threaded view
|

Re: [GIT PULL] y2038 changes for vfs

Dave Chinner
On Wed, May 25, 2016 at 06:03:19PM +0200, Arnd Bergmann wrote:

> On Tuesday, May 24, 2016 3:23:39 PM CEST Linus Torvalds wrote:
> > On Tue, May 24, 2016 at 1:11 PM, Arnd Bergmann <[hidden email]> wrote:
> > > The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:
> > >
> > >   Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)
> > >
> > > are available in the git repository at:
> > >
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git tags/y2038-4.7
> >
> > The more I look at this, the less I like it.
> >
> > There doesn't even seem to be any *point* to the preparatory patches.
> > I'm not seeing what any of those patches actually help prepare. The
> > two new superblock fields that it adds, for example, should likely
> > never be touched directly by any code in the first place, so adding
> > them only encourages people to add more "preparatory" patches to
> > filesystems that simply don't seem sensible.
....
> > It's not like it's hard to compile-test the pretty mechanical
> > conversion. There are no architecture-specific users, so I suspect
> > that a trivial "make allmodconfig" build will catch all the cases.
> >
> > Why drag something like this out, in other words?

Good question, indeed.

> The vfs_time_to_timespec/timespec_to_vfs_time accessors and the
> s_time_min/s_time_max patch are really the ones that make most
> sense doing per file system. These are still all really simple
> patches, but it seemed logical to keep all three together and then
> go through each file system one by one. The hard part here is
> really catching the attention of the file system maintainers,
> not doing the patches.

I was the only filesystem person who attempted to the review your
changes 3 months ago. After the amount of shit you and Deepa dragged
me through as I tried to get you to restructure the patchset
*exactly* like Linus us now suggesting, I walked away and haven't
looked at your patches since.  Is it any wonder that no other
filesystem maintainer has bothered to waste their time on this
since?

Linus - I'd suggest these VFS timestamp patches need to go through
Al's VFS tree. That way we don't get unreviewed VFS infrastructure
changes going into your tree via a door that nobody was paying
attention to...

Cheers,

Dave.
--
Dave Chinner
[hidden email]