[RFC][PATCH 00/12] Enhanced file stat system call

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Martin Steigerwald
Am Dienstag, 24. November 2015, 00:13:08 CET schrieb Christoph Hellwig:
> On Fri, Nov 20, 2015 at 05:19:31PM +0100, Martin Steigerwald wrote:
> > I know its mostly relevant for just for FAT32, but on any account rather
> > than trying to write 4 GiB and then file, it would be good to at some
> > time get a dialog at the beginning of the copy.
>
> pathconf/fpathconf is supposed to handle that.  It's not super pretty
> but part of Posix.  Linus hates it, but it might be time to give it
> another try.

It might be interesting for BTRFS as well, to be able to ask what amount of
free space there currently is *at* a given path. Cause with BTRFS and
Subvolumes this may differ between different paths. Even tough its not
implemented yet, it may be possible in the future to have one subvolume with
RAID 1 profile and one with RAID 0 profile.

That said an application wanting to make sure it can write a certain amount of
data can use fallocate. And thats thats the only reliable way to ensure it, I
know of. Which can become tedious for several files, but there is no principal
problem with preallocating all files if their sizes are known. Even rsync or
desktop environments could work like that. First fallocate everything, then,
only if that succeeds, start actually copying data. Disadvantage: On aborted
copies you have all files with their correct sizes and no easy indicates on
where the copy stopped.

Thanks,
--
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Christoph Hellwig
On Tue, Nov 24, 2015 at 09:48:22AM +0100, Martin Steigerwald wrote:
> It might be interesting for BTRFS as well, to be able to ask what amount of
> free space there currently is *at* a given path. Cause with BTRFS and
> Subvolumes this may differ between different paths.

We can handle this trivial with the current statfs interface.  Take a
look at xfs_fs_statfs and xfs_qm_statvfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Casey Schaufler
In reply to this post by Christoph Hellwig
On 11/24/2015 12:15 AM, Christoph Hellwig wrote:
> On Fri, Nov 20, 2015 at 08:50:22AM -0800, Casey Schaufler wrote:
>> How about relevant xattrs? SELinux context, ACL, that sort of thing.
>> The fact that these are optional should be taken care of by (4).
> Those are not simple, fixed size stat data and would make the system
> call a giant mess.
>
I didn't say it would be easy. I do think that adding a system call
that only deals with simple, fixed size data is going to fall short
of solving "the problem".

Actually, a Smack label is fixed size (256 bytes). I suspect there
is a maximum for SELinux contexts as well. ACLs I'll grant you are
infinite.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Andreas Dilger-7
In reply to this post by Casey Schaufler
On Nov 20, 2015, at 9:50 AM, Casey Schaufler <[hidden email]> wrote:

> On 11/20/2015 6:54 AM, David Howells wrote:
>> Implement new system calls to provide enhanced file stats and enhanced
>> filesystem stats.  The patches can be found here:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=xstat
>>
>>
>> ===========
>> DESCRIPTION
>> ===========
>>
>> The third patch provides this new system call:
>>
>> long ret = statx(int dfd,
>> const char *filename,
>> unsigned atflag,
>> unsigned mask,
>> struct statx *buffer);
>>
>> This is an enhanced file stat function that provides a number of useful
>> features, in summary:
>>
>>  (1) More information: creation time, data version number,
>>      flags/attributes.  A subset of these is available through a number of
>>      filesystems (such as CIFS, NFS, AFS, Ext4 and BTRFS).
>>
>>  (2) Lightweight stat (AT_NO_ATTR_SYNC): Ask for just those details of
>>      interest, and allow a network fs to approximate anything not of
>>      interest, without going to the server.
>>
>>  (3) Heavyweight stat (AT_FORCE_ATTR_SYNC): Force a network fs to flush
>>      buffers and go to the server, even if it thinks its cached attributes
>>      are up to date.
>>
>>  (4) Allow the filesystem to indicate what it can/cannot provide: A
>>      filesystem can now say it doesn't support a standard stat feature if
>>      that isn't available.
>>
>>  (5) Make the fields a consistent size on all arches, and make them large.
>>
>>  (6) Can be extended by using more request flags and using up the padding
>>      space in the statx struct.
>
> How about relevant xattrs? SELinux context, ACL, that sort of thing.
> The fact that these are optional should be taken care of by (4).
Given that there are a wide variety of xattrs that different apps might be
interested in, this would probably be better served by an enhancement to
getxattr() or listxattr() to be able to retrieve a whole list of xattrs
at once, possibly with some wildcard support (e.g. "security.*") instead
of returning all or a specific subset of xattrs with statx() (which is
geared toward fixed-size attributes).

Cheers, Andreas


>> Note that no lstat() equivalent is required as that can be implemented
>> through statx() with atflag == 0.  There is also no fstat() equivalent as
>> that can be implemented through statx() with filename == NULL and the
>> relevant fd passed as dfd.
>>
>>
>> The seventh patch provides another new system call:
>>
>> long ret = fsinfo(int dfd,
>>  const char *filename,
>>  unsigned atflag,
>>  unsigned request,
>>  void *buffer);
>>
>> This is an enhanced filesystem stat and information retrieval function that
>> provides more information, in summary:
>>
>>  (1) All the information provided by statfs() and more.  The fields are
>>      made large.
>>
>>  (2) Provides information about timestamp range and resolution to
>>      complement statx().
>>
>>  (3) Provides information about IOC flags supported in statx()'s return.
>>
>>  (4) Provides volume binary IDs and UUIDs.
>>
>>  (5) Provides the filesystem name according to the kernel as a string
>>      (eg. "ext4" or "nfs3") in addition to the magic number.
>>
>>  (6) Provides information obtained from network filesystems, such as volume
>>      and domain names.
>>
>>  (7) Has lots of spare space that can be used for future extenstions and a
>>      bit mask indicating what was provided.
>>
>> Note that I've added a 'request' identifier.  This is to select the set of
>> data to be returned.  The idea is that 'buffer' points to a fixed-size
>> struct selected by request.  Currently only 0 is available and this refers
>> to 'struct fsinfo'.  However, I could split up the buffer into say 3:
>>
>>  (0) statfs-type information
>>
>>  (1) Timestamp and IOC flags info.
>>
>>  (2) Network fs strings.
>>
>> However, some of this might be better retrieved through getxattr().
>>
>>
>> =======
>> TESTING
>> =======
>>
>> Test programs are added into samples/statx/ by the appropriate patches.
>>
>> David
>> ---
>> David Howells (12):
>>       Ext4: Fix extended timestamp encoding and decoding
>>       statx: Provide IOC flags for Windows fs attributes
>>       statx: Add a system call to make enhanced file info available
>>       statx: AFS: Return enhanced file attributes
>>       statx: Ext4: Return enhanced file attributes
>>       statx: NFS: Return enhanced file attributes
>>       statx: CIFS: Return enhanced attributes
>>       fsinfo: Add a system call to make enhanced filesystem info available
>>       fsinfo: Ext4: Return information through the filesystem info syscall
>>       fsinfo: AFS: Return information through the filesystem info syscall
>>       fsinfo: NFS: Return information through the filesystem info syscall
>>       fsinfo: CIFS: Return information through the filesystem info syscall
>>
>>
>>  arch/x86/entry/syscalls/syscall_32.tbl |    2
>>  arch/x86/entry/syscalls/syscall_64.tbl |    2
>>  fs/afs/inode.c                         |   23 ++
>>  fs/afs/super.c                         |   39 ++++
>>  fs/cifs/cifsfs.c                       |   25 +++
>>  fs/cifs/cifsfs.h                       |    4
>>  fs/cifs/cifsglob.h                     |    8 +
>>  fs/cifs/dir.c                          |    2
>>  fs/cifs/inode.c                        |  124 ++++++++++---
>>  fs/cifs/netmisc.c                      |    4
>>  fs/exportfs/expfs.c                    |    4
>>  fs/ext4/ext4.h                         |   24 ++-
>>  fs/ext4/file.c                         |    2
>>  fs/ext4/inode.c                        |   31 +++
>>  fs/ext4/namei.c                        |    2
>>  fs/ext4/super.c                        |   39 ++++
>>  fs/ext4/symlink.c                      |    2
>>  fs/nfs/inode.c                         |   45 ++++-
>>  fs/nfs/internal.h                      |    1
>>  fs/nfs/nfs4super.c                     |    1
>>  fs/nfs/super.c                         |   58 ++++++
>>  fs/ntfs/time.h                         |    2
>>  fs/stat.c                              |  305 +++++++++++++++++++++++++++++---
>>  fs/statfs.c                            |  218 +++++++++++++++++++++++
>>  include/linux/fs.h                     |    7 +
>>  include/linux/stat.h                   |   14 +
>>  include/linux/syscalls.h               |    6 +
>>  include/linux/time64.h                 |    2
>>  include/uapi/linux/fcntl.h             |    2
>>  include/uapi/linux/fs.h                |    7 +
>>  include/uapi/linux/stat.h              |  185 +++++++++++++++++++
>>  samples/Makefile                       |    3
>>  samples/statx/Makefile                 |   13 +
>>  samples/statx/test-fsinfo.c            |  179 +++++++++++++++++++
>>  samples/statx/test-statx.c             |  273 +++++++++++++++++++++++++++++
>>  35 files changed, 1558 insertions(+), 100 deletions(-)
>>  create mode 100644 samples/statx/Makefile
>>  create mode 100644 samples/statx/test-fsinfo.c
>>  create mode 100644 samples/statx/test-statx.c
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [hidden email]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas






signature.asc (850 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 07/12] statx: CIFS: Return enhanced attributes

Steve French-2
In reply to this post by David Howells
Is it worth storing the same creation time twice (in slightly
different formats, struct timespec and u64 DCE time) in cifsInodeInfo?

On Fri, Nov 20, 2015 at 8:55 AM, David Howells <[hidden email]> wrote:

> Return enhanced attributes from the CIFS filesystem.  This includes the
> following:
>
>  (1) Return the file creation time as btime.  We assume that the creation
>      time won't change over the life of the inode.
>
>  (2) Set STATX_INFO_AUTOMOUNT on referral/submount directories.
>
>  (3) Unset STATX_INO if we made up the inode number and didn't get it from
>      the server.
>
>  (4) Unset STATX_[UG]ID if we are either returning values passed to mount
>      and/or the server doesn't return them.
>
>  (5) Set STATX_IOC_FLAGS and map various Windows file attributes to
>      FS_xxx_FL flags in st_ioc_flags, fetching them from the server if we
>      don't have them yet or don't have a current copy.  This includes the
>      following:
>
>         ATTR_READONLY   -> FS_IMMUTABLE_FL
>         ATTR_COMPRESSED -> FS_COMPR_FL
>         ATTR_HIDDEN     -> FS_HIDDEN_FL
>         ATTR_SYSTEM     -> FS_SYSTEM_FL
>         ATTR_ARCHIVE    -> FS_ARCHIVE_FL
>
>  (6) Set certain STATX_INFO_xxx to reflect other Windows file attributes:
>
>         ATTR_TEMPORARY  -> STATX_INFO_TEMPORARY;
>         ATTR_REPARSE    -> STATX_INFO_REPARSE_POINT;
>         ATTR_OFFLINE    -> STATX_INFO_OFFLINE;
>         ATTR_ENCRYPTED  -> STATX_INFO_ENCRYPTED;
>
>  (7) Set STATX_INFO_REMOTE on all files fetched by CIFS.
>
>  (8) Set STATX_INFO_NONSYSTEM_OWNERSHIP on all files as they all have
>      Windows ownership details too.
>
> Furthermore, what cifs_getattr() does can be controlled as follows:
>
>  (1) If AT_NO_ATTR_SYNC is indicated then this will suppress the flushing
>      of outstanding writes and the rereading of the inode's attributes with
>      the server as detailed below.
>
>  (2) Otherwise:
>
>      (a) If AT_FORCE_ATTR_SYNC is indicated, or mtime, ctime or size are
>          requested then the outstanding writes will be written to the
>          server first.
>
>      (b) The inode's attributes will be reread from the server:
>
>          (i) if AT_FORCE_ATTR_SYNC is indicated;
>
>         (ii) if the cached attributes have expired;
>
>        (iii) extra attributes are requested that aren't normally stored.
>
> If the inode isn't synchronised, then the cached attributes will be used -
> even if expired - without reference to the server.  Some attributes may be
> unavailable that would otherwise be provided.
>
> Note that cifs_revalidate_dentry() will issue an extra operation to get the
> FILE_ALL_INFO in addition to the FILE_UNIX_BASIC_INFO if it needs to
> collect creation time and attributes on behalf of cifs_getattr().
>
> [NOTE: THIS PATCH IS UNTESTED!]
>
> Signed-off-by: David Howells <[hidden email]>
> ---
>
>  fs/cifs/cifsfs.h   |    4 +-
>  fs/cifs/cifsglob.h |    8 +++
>  fs/cifs/dir.c      |    2 -
>  fs/cifs/inode.c    |  124 +++++++++++++++++++++++++++++++++++++++++-----------
>  4 files changed, 108 insertions(+), 30 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.h b/fs/cifs/cifsfs.h
> index c3cc1609025f..fb38d47d84de 100644
> --- a/fs/cifs/cifsfs.h
> +++ b/fs/cifs/cifsfs.h
> @@ -71,9 +71,9 @@ extern int cifs_rmdir(struct inode *, struct dentry *);
>  extern int cifs_rename2(struct inode *, struct dentry *, struct inode *,
>                         struct dentry *, unsigned int);
>  extern int cifs_revalidate_file_attr(struct file *filp);
> -extern int cifs_revalidate_dentry_attr(struct dentry *);
> +extern int cifs_revalidate_dentry_attr(struct dentry *, bool, bool);
>  extern int cifs_revalidate_file(struct file *filp);
> -extern int cifs_revalidate_dentry(struct dentry *);
> +extern int cifs_revalidate_dentry(struct dentry *, bool, bool);
>  extern int cifs_invalidate_mapping(struct inode *inode);
>  extern int cifs_revalidate_mapping(struct inode *inode);
>  extern int cifs_zap_mapping(struct inode *inode);
> diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
> index b406a32deb1f..493e40a15b86 100644
> --- a/fs/cifs/cifsglob.h
> +++ b/fs/cifs/cifsglob.h
> @@ -1152,7 +1152,11 @@ struct cifsInodeInfo {
>         unsigned long flags;
>         spinlock_t writers_lock;
>         unsigned int writers;           /* Number of writers on this inode */
> +       bool btime_valid:1;             /* stored creation time is valid */
> +       bool uid_faked:1;               /* true if i_uid is faked */
> +       bool gid_faked:1;               /* true if i_gid is faked */
>         unsigned long time;             /* jiffies of last update of inode */
> +       struct timespec btime;          /* creation time */
>         u64  server_eof;                /* current file size on server -- protected by i_lock */
>         u64  uniqueid;                  /* server inode number */
>         u64  createtime;                /* creation time on server */
> @@ -1365,6 +1369,9 @@ struct dfs_info3_param {
>  #define CIFS_FATTR_NEED_REVAL          0x4
>  #define CIFS_FATTR_INO_COLLISION       0x8
>  #define CIFS_FATTR_UNKNOWN_NLINK       0x10
> +#define CIFS_FATTR_WINATTRS_VALID      0x20    /* T if cf_btime and cf_cifsattrs valid */
> +#define CIFS_FATTR_UID_FAKED           0x40    /* T if cf_uid is faked */
> +#define CIFS_FATTR_GID_FAKED           0x80    /* T if cf_gid is faked */
>
>  struct cifs_fattr {
>         u32             cf_flags;
> @@ -1382,6 +1389,7 @@ struct cifs_fattr {
>         struct timespec cf_atime;
>         struct timespec cf_mtime;
>         struct timespec cf_ctime;
> +       struct timespec cf_btime;
>  };
>
>  static inline void free_dfs_info_param(struct dfs_info3_param *param)
> diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> index c3eb998a99bd..4984f04b0677 100644
> --- a/fs/cifs/dir.c
> +++ b/fs/cifs/dir.c
> @@ -792,7 +792,7 @@ cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
>                 return -ECHILD;
>
>         if (d_really_is_positive(direntry)) {
> -               if (cifs_revalidate_dentry(direntry))
> +               if (cifs_revalidate_dentry(direntry, false, false))
>                         return 0;
>                 else {
>                         /*
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 6b66dd5d1540..fcb024efbd4b 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -166,13 +166,21 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
>         cifs_nlink_fattr_to_inode(inode, fattr);
>         inode->i_uid = fattr->cf_uid;
>         inode->i_gid = fattr->cf_gid;
> +       if (fattr->cf_flags & CIFS_FATTR_UID_FAKED)
> +               cifs_i->uid_faked = true;
> +       if (fattr->cf_flags & CIFS_FATTR_GID_FAKED)
> +               cifs_i->gid_faked = true;
>
>         /* if dynperm is set, don't clobber existing mode */
>         if (inode->i_state & I_NEW ||
>             !(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_DYNPERM))
>                 inode->i_mode = fattr->cf_mode;
>
> -       cifs_i->cifsAttrs = fattr->cf_cifsattrs;
> +       if (fattr->cf_flags & CIFS_FATTR_WINATTRS_VALID) {
> +               cifs_i->cifsAttrs = fattr->cf_cifsattrs;
> +               cifs_i->btime = fattr->cf_btime;
> +               cifs_i->btime_valid = true;
> +       }
>
>         if (fattr->cf_flags & CIFS_FATTR_NEED_REVAL)
>                 cifs_i->time = 0;
> @@ -284,18 +292,22 @@ cifs_unix_basic_to_fattr(struct cifs_fattr *fattr, FILE_UNIX_BASIC_INFO *info,
>                 u64 id = le64_to_cpu(info->Uid);
>                 if (id < ((uid_t)-1)) {
>                         kuid_t uid = make_kuid(&init_user_ns, id);
> -                       if (uid_valid(uid))
> +                       if (uid_valid(uid)) {
>                                 fattr->cf_uid = uid;
> +                               fattr->cf_flags |= CIFS_FATTR_UID_FAKED;
> +                       }
>                 }
>         }
> -
> +
>         fattr->cf_gid = cifs_sb->mnt_gid;
>         if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID)) {
>                 u64 id = le64_to_cpu(info->Gid);
>                 if (id < ((gid_t)-1)) {
>                         kgid_t gid = make_kgid(&init_user_ns, id);
> -                       if (gid_valid(gid))
> +                       if (gid_valid(gid)) {
>                                 fattr->cf_gid = gid;
> +                               fattr->cf_flags |= CIFS_FATTR_GID_FAKED;
> +                       }
>                 }
>         }
>
> @@ -324,7 +336,8 @@ cifs_create_dfs_fattr(struct cifs_fattr *fattr, struct super_block *sb)
>         fattr->cf_ctime = CURRENT_TIME;
>         fattr->cf_mtime = CURRENT_TIME;
>         fattr->cf_nlink = 2;
> -       fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL;
> +       fattr->cf_flags |= CIFS_FATTR_DFS_REFERRAL |
> +               CIFS_FATTR_UID_FAKED | CIFS_FATTR_GID_FAKED;
>  }
>
>  static int
> @@ -590,6 +603,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
>         struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
>
>         memset(fattr, 0, sizeof(*fattr));
> +       fattr->cf_flags = CIFS_FATTR_WINATTRS_VALID;
>         fattr->cf_cifsattrs = le32_to_cpu(info->Attributes);
>         if (info->DeletePending)
>                 fattr->cf_flags |= CIFS_FATTR_DELETE_PENDING;
> @@ -601,6 +615,7 @@ cifs_all_info_to_fattr(struct cifs_fattr *fattr, FILE_ALL_INFO *info,
>
>         fattr->cf_ctime = cifs_NTtimeToUnix(info->ChangeTime);
>         fattr->cf_mtime = cifs_NTtimeToUnix(info->LastWriteTime);
> +       fattr->cf_btime = cifs_NTtimeToUnix(info->CreationTime);
>
>         if (adjust_tz) {
>                 fattr->cf_ctime.tv_sec += tcon->ses->server->timeAdj;
> @@ -1887,7 +1902,8 @@ int cifs_revalidate_file_attr(struct file *filp)
>         return rc;
>  }
>
> -int cifs_revalidate_dentry_attr(struct dentry *dentry)
> +int cifs_revalidate_dentry_attr(struct dentry *dentry,
> +                               bool want_extra_bits, bool force)
>  {
>         unsigned int xid;
>         int rc = 0;
> @@ -1898,7 +1914,7 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
>         if (inode == NULL)
>                 return -ENOENT;
>
> -       if (!cifs_inode_needs_reval(inode))
> +       if (!force && !cifs_inode_needs_reval(inode))
>                 return rc;
>
>         xid = get_xid();
> @@ -1915,9 +1931,12 @@ int cifs_revalidate_dentry_attr(struct dentry *dentry)
>                  full_path, inode, inode->i_count.counter,
>                  dentry, dentry->d_time, jiffies);
>
> -       if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext)
> +       if (cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext) {
>                 rc = cifs_get_inode_info_unix(&inode, full_path, sb, xid);
> -       else
> +               if (rc != 0)
> +                       goto out;
> +       }
> +       if (!cifs_sb_master_tcon(CIFS_SB(sb))->unix_ext || want_extra_bits)
>                 rc = cifs_get_inode_info(&inode, full_path, NULL, sb,
>                                          xid, NULL);
>
> @@ -1940,12 +1959,13 @@ int cifs_revalidate_file(struct file *filp)
>  }
>
>  /* revalidate a dentry's inode attributes */
> -int cifs_revalidate_dentry(struct dentry *dentry)
> +int cifs_revalidate_dentry(struct dentry *dentry,
> +                          bool want_extra_bits, bool force)
>  {
>         int rc;
>         struct inode *inode = d_inode(dentry);
>
> -       rc = cifs_revalidate_dentry_attr(dentry);
> +       rc = cifs_revalidate_dentry_attr(dentry, want_extra_bits, force);
>         if (rc)
>                 return rc;
>
> @@ -1958,28 +1978,62 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>         struct cifs_sb_info *cifs_sb = CIFS_SB(dentry->d_sb);
>         struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
>         struct inode *inode = d_inode(dentry);
> +       struct cifsInodeInfo *cifs_i = CIFS_I(inode);
> +       bool force = stat->query_flags & AT_FORCE_ATTR_SYNC;
> +       bool want_extra_bits = false;
> +       u32 info, ioc = 0;
> +       u32 attrs;
>         int rc;
>
> -       /*
> -        * We need to be sure that all dirty pages are written and the server
> -        * has actual ctime, mtime and file length.
> -        */
> -       if (!CIFS_CACHE_READ(CIFS_I(inode)) && inode->i_mapping &&
> -           inode->i_mapping->nrpages != 0) {
> -               rc = filemap_fdatawait(inode->i_mapping);
> -               if (rc) {
> -                       mapping_set_error(inode->i_mapping, rc);
> -                       return rc;
> +       if (cifs_i->uid_faked)
> +               stat->request_mask &= ~STATX_UID;
> +       if (cifs_i->gid_faked)
> +               stat->request_mask &= ~STATX_GID;
> +
> +       if ((stat->request_mask & STATX_BTIME && !cifs_i->btime_valid) ||
> +           stat->request_mask & STATX_IOC_FLAGS)
> +               want_extra_bits = force = true;
> +
> +       if (!(stat->query_flags & AT_NO_ATTR_SYNC)) {
> +               /* Unless we're explicitly told not to sync, we need to be sure
> +                * that all dirty pages are written and the server has actual
> +                * ctime, mtime and file length.
> +                */
> +               bool flush = force;
> +
> +               if (stat->request_mask &
> +                   (STATX_CTIME | STATX_MTIME | STATX_SIZE))
> +                       flush = true;
> +
> +               if (flush &&
> +                   !CIFS_CACHE_READ(CIFS_I(inode)) && inode->i_mapping &&
> +                   inode->i_mapping->nrpages != 0) {
> +                       rc = filemap_fdatawait(inode->i_mapping);
> +                       if (rc) {
> +                               mapping_set_error(inode->i_mapping, rc);
> +                               return rc;
> +                       }
>                 }
> -       }
>
> -       rc = cifs_revalidate_dentry_attr(dentry);
> -       if (rc)
> -               return rc;
> +               rc = cifs_revalidate_dentry(dentry, want_extra_bits, force);
> +               if (rc)
> +                       return rc;
> +       }
>
>         generic_fillattr(inode, stat);
>         stat->blksize = CIFS_MAX_MSGSIZE;
> -       stat->ino = CIFS_I(inode)->uniqueid;
> +
> +       info = STATX_INFO_REMOTE | STATX_INFO_NONSYSTEM_OWNERSHIP;
> +
> +       if (cifs_i->btime_valid) {
> +               stat->btime = cifs_i->btime;
> +               stat->result_mask |= STATX_BTIME;
> +       }
> +
> +       /* We don't promise an inode number if we made one up */
> +       stat->ino = cifs_i->uniqueid;
> +       if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM))
> +               stat->result_mask &= ~STATX_INO;
>
>         /*
>          * If on a multiuser mount without unix extensions or cifsacl being
> @@ -1993,8 +2047,24 @@ int cifs_getattr(struct vfsmount *mnt, struct dentry *dentry,
>                         stat->uid = current_fsuid();
>                 if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_OVERR_GID))
>                         stat->gid = current_fsgid();
> +               stat->result_mask &= ~(STATX_UID | STATX_GID);
>         }
> -       return rc;
> +
> +       attrs = cifs_i->cifsAttrs;
> +       if (attrs & ATTR_TEMPORARY)     info |= STATX_INFO_TEMPORARY;
> +       if (attrs & ATTR_REPARSE)       info |= STATX_INFO_REPARSE_POINT;
> +       if (attrs & ATTR_OFFLINE)       info |= STATX_INFO_OFFLINE;
> +       if (attrs & ATTR_ENCRYPTED)     info |= STATX_INFO_ENCRYPTED;
> +       stat->information |= info;
> +
> +       if (attrs & ATTR_READONLY)      ioc |= FS_IMMUTABLE_FL;
> +       if (attrs & ATTR_COMPRESSED)    ioc |= FS_COMPR_FL;
> +       if (attrs & ATTR_HIDDEN)        ioc |= FS_HIDDEN_FL;
> +       if (attrs & ATTR_SYSTEM)        ioc |= FS_SYSTEM_FL;
> +       if (attrs & ATTR_ARCHIVE)       ioc |= FS_ARCHIVE_FL;
> +       stat->ioc_flags |= ioc;
> +
> +       return 0;
>  }
>
>  static int cifs_truncate_page(struct address_space *mapping, loff_t from)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

Andreas Dilger-7
In reply to this post by David Howells
On Nov 20, 2015, at 7:54 AM, David Howells <[hidden email]> wrote:

>
> The handling of extended timestamps in Ext4 is broken as can be seen in the
> output of the test program attached below:
>
> time     extra   bad decode        good decode         bad encode  good encode
> ======== =====   ================= =================   =========== ===========
> ffffffff     0 >  ffffffffffffffff  ffffffffffffffff > *ffffffff 3  ffffffff 0
> 80000000     0 >  ffffffff80000000  ffffffff80000000 > *80000000 3  80000000 0
> 00000000     0 >                 0                 0 >  00000000 0  00000000 0
> 7fffffff     0 >          7fffffff          7fffffff >  7fffffff 0  7fffffff 0
> 80000000     1 > *ffffffff80000000          80000000 > *80000000 0  80000000 1
> ffffffff     1 > *ffffffffffffffff          ffffffff > *ffffffff 0  ffffffff 1
> 00000000     1 >         100000000         100000000 >  00000000 1  00000000 1
> 7fffffff     1 >         17fffffff         17fffffff >  7fffffff 1  7fffffff 1
> 80000000     2 > *ffffffff80000000         180000000 > *80000000 1  80000000 2
> ffffffff     2 > *ffffffffffffffff         1ffffffff > *ffffffff 1  ffffffff 2
> 00000000     2 >         200000000         200000000 >  00000000 2  00000000 2
> 7fffffff     2 >         27fffffff         27fffffff >  7fffffff 2  7fffffff 2
> 80000000     3 > *ffffffff80000000         280000000 > *80000000 2  80000000 3
> ffffffff     3 > *ffffffffffffffff         2ffffffff > *ffffffff 2  ffffffff 3
> 00000000     3 >         300000000         300000000 >  00000000 3  00000000 3
> 7fffffff     3 >         37fffffff         37fffffff >  7fffffff 3  7fffffff 3
>
> The values marked with asterisks are wrong.
>
> The problem is that with a 64-bit time, in ext4_decode_extra_time() the
> epoch value is just OR'd with the sign-extended time - which, if negative,
> has all of the upper 32 bits set anyway.  We need to add the epoch instead
> of OR'ing it.  In ext4_encode_extra_time(), the reverse operation needs to
> take place as the 32-bit part of the number of seconds needs to be
> subtracted from the 64-bit value before the epoch is shifted down.
>
> Since the epoch is presumably unsigned, this has the slightly strange
> effect of, for epochs > 0, putting the 0x80000000-0xffffffff range before
> the 0x00000000-0x7fffffff range.
>
> This affects all kernels from v2.6.23-rc1 onwards.
>
> The test program:
>
> #include <stdio.h>
>
> #define EXT4_FITS_IN_INODE(x, y, z) 1
> #define EXT4_EPOCH_BITS 2
> #define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
> #define EXT4_NSEC_MASK  (~0UL << EXT4_EPOCH_BITS)
>
> #define le32_to_cpu(x) (x)
> #define cpu_to_le32(x) (x)
> typedef unsigned int __le32;
> typedef unsigned int u32;
> typedef signed int s32;
> typedef unsigned long long __u64;
> typedef signed long long s64;
>
> struct timespec {
> long long tv_sec; /* seconds */
> long tv_nsec; /* nanoseconds */
> };
>
> struct ext4_inode_info {
> struct timespec i_crtime;
> };
>
> struct ext4_inode {
> __le32  i_crtime;       /* File Creation time */
> __le32  i_crtime_extra; /* extra FileCreationtime (nsec << 2 | epoch) */
> };
>
> /* Incorrect implementation */
> static inline void ext4_decode_extra_time_bad(struct timespec *time, __le32 extra)
> {
>       if (sizeof(time->tv_sec) > 4)
>       time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
>       << 32;
>       time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> }
>
> static inline __le32 ext4_encode_extra_time_bad(struct timespec *time)
> {
>       return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
>   (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
>  ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
> }
>
> /* Fixed implementation */
> static inline void ext4_decode_extra_time_good(struct timespec *time, __le32 _extra)
> {
> u32 extra = le32_to_cpu(_extra);
> u32 epoch = extra & EXT4_EPOCH_MASK;
>
> time->tv_sec = (s32)time->tv_sec + ((s64)epoch  << 32);
> time->tv_nsec = (extra & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> }
>
> static inline __le32 ext4_encode_extra_time_good(struct timespec *time)
> {
> u32 extra;
> s64 epoch = time->tv_sec - (s32)time->tv_sec;
>
> extra = (epoch >> 32) & EXT4_EPOCH_MASK;
> extra |= (time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK;
> return cpu_to_le32(extra);
> }
>
> #define EXT4_INODE_GET_XTIME_BAD(xtime, inode, raw_inode) \
> do {       \
> (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime);       \
> if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra))     \
> ext4_decode_extra_time_bad(&(inode)->xtime, \
>   raw_inode->xtime ## _extra); \
> else       \
> (inode)->xtime.tv_nsec = 0;       \
> } while (0)
>
> #define EXT4_INODE_SET_XTIME_BAD(xtime, inode, raw_inode) \
> do {       \
> (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);       \
> if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra))     \
> (raw_inode)->xtime ## _extra =       \
> ext4_encode_extra_time_bad(&(inode)->xtime); \
> } while (0)
>
> #define EXT4_INODE_GET_XTIME_GOOD(xtime, inode, raw_inode) \
> do {       \
> (inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime);       \
> if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra))     \
> ext4_decode_extra_time_good(&(inode)->xtime,       \
>       raw_inode->xtime ## _extra); \
> else       \
> (inode)->xtime.tv_nsec = 0;       \
> } while (0)
>
> #define EXT4_INODE_SET_XTIME_GOOD(xtime, inode, raw_inode) \
> do {       \
> (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);       \
> if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra))     \
> (raw_inode)->xtime ## _extra =       \
> ext4_encode_extra_time_good(&(inode)->xtime); \
> } while (0)
>
> static const struct test {
> unsigned crtime;
> unsigned extra;
> long long sec;
> int nsec;
> } tests[] = {
> // crtime extra tv_sec tv_nsec
> 0xffffffff, 0x00000000, 0xffffffffffffffff, 0,
> 0x80000000, 0x00000000, 0xffffffff80000000, 0,
> 0x00000000, 0x00000000, 0x0000000000000000, 0,
> 0x7fffffff, 0x00000000, 0x000000007fffffff, 0,
> 0x80000000, 0x00000001, 0x0000000080000000, 0,
> 0xffffffff, 0x00000001, 0x00000000ffffffff, 0,
> 0x00000000, 0x00000001, 0x0000000100000000, 0,
> 0x7fffffff, 0x00000001, 0x000000017fffffff, 0,
> 0x80000000, 0x00000002, 0x0000000180000000, 0,
> 0xffffffff, 0x00000002, 0x00000001ffffffff, 0,
> 0x00000000, 0x00000002, 0x0000000200000000, 0,
> 0x7fffffff, 0x00000002, 0x000000027fffffff, 0,
> 0x80000000, 0x00000003, 0x0000000280000000, 0,
> 0xffffffff, 0x00000003, 0x00000002ffffffff, 0,
> 0x00000000, 0x00000003, 0x0000000300000000, 0,
> 0x7fffffff, 0x00000003, 0x000000037fffffff, 0,
> };
>
> int main()
> {
> struct ext4_inode_info ii_bad, ii_good;
> struct ext4_inode raw, *praw = &raw;
> struct ext4_inode raw_bad, *praw_bad = &raw_bad;
> struct ext4_inode raw_good, *praw_good = &raw_good;
> const struct test *t;
> int i, ret = 0;
>
> printf("time     extra   bad decode        good decode         bad encode  good encode\n");
> printf("======== =====   ================= =================   =========== ===========\n");
> for (i = 0; i < sizeof(tests) / sizeof(t[0]); i++) {
> t = &tests[i];
> raw.i_crtime = t->crtime;
> raw.i_crtime_extra = t->extra;
> printf("%08x %5d > ", t->crtime, t->extra);
>
> EXT4_INODE_GET_XTIME_BAD(i_crtime, &ii_bad, praw);
> EXT4_INODE_GET_XTIME_GOOD(i_crtime, &ii_good, praw);
> if (ii_bad.i_crtime.tv_sec != t->sec ||
>    ii_bad.i_crtime.tv_nsec != t->nsec)
> printf("*");
> else
> printf(" ");
> printf("%16llx", ii_bad.i_crtime.tv_sec);
> printf(" ");
> if (ii_good.i_crtime.tv_sec != t->sec ||
>    ii_good.i_crtime.tv_nsec != t->nsec) {
> printf("*");
> ret = 1;
> } else {
> printf(" ");
> }
> printf("%16llx", ii_good.i_crtime.tv_sec);
>
> EXT4_INODE_SET_XTIME_BAD(i_crtime, &ii_good, praw_bad);
> EXT4_INODE_SET_XTIME_GOOD(i_crtime, &ii_good, praw_good);
>
> printf(" > ");
> if (raw_bad.i_crtime != raw.i_crtime ||
>    raw_bad.i_crtime_extra != raw.i_crtime_extra)
> printf("*");
> else
> printf(" ");
> printf("%08llx %d", raw_bad.i_crtime, raw_bad.i_crtime_extra);
> printf(" ");
>
> if (raw_good.i_crtime != raw.i_crtime ||
>    raw_good.i_crtime_extra != raw.i_crtime_extra) {
> printf("*");
> ret = 1;
> } else {
> printf(" ");
> }
> printf("%08llx %d", raw_good.i_crtime, raw_good.i_crtime_extra);
> printf("\n");
> }
>
> return ret;
> }
>
> Signed-off-by: David Howells <[hidden email]>
> ---
>
> fs/ext4/ext4.h |   22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index fd1f28be5296..31efcd78bf51 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -723,19 +723,23 @@ struct move_extent {
> <= (EXT4_GOOD_OLD_INODE_SIZE + \
>    (einode)->i_extra_isize)) \
>
> -static inline __le32 ext4_encode_extra_time(struct timespec *time)
> +static inline void ext4_decode_extra_time(struct timespec *time, __le32 _extra)
> {
> -       return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
> -   (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
> -                          ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
> + u32 extra = le32_to_cpu(_extra);
> + u32 epoch = extra & EXT4_EPOCH_MASK;
> +
> + time->tv_sec = (s32)time->tv_sec + ((s64)epoch  << 32);
Minor nit - two spaces before "<<".

> + time->tv_nsec = (extra & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> }
>
> -static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
> +static inline __le32 ext4_encode_extra_time(struct timespec *time)
> {
> -       if (sizeof(time->tv_sec) > 4)
> -       time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
> -       << 32;
> -       time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
> + u32 extra;
> + s64 epoch = time->tv_sec - (s32)time->tv_sec;
> +
> + extra = (epoch >> 32) & EXT4_EPOCH_MASK;
> + extra |= (time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK;
> + return cpu_to_le32(extra);
> }
David, thanks for moving this patch forward.

It would have been easier to review if the order of encode and decode
functions had not been reversed... :-)

It would be good to get a comment block in the code describing the encoding:

/*
 * We need is an encoding that preserves the times for extra epoch "00":
 *
 * extra                          add to 32-bit
 * epoch                          tv_sec to get
 * bits   decoded 64-bit tv_sec   64-bit value    valid time range
 * 0     -0x80000000..-0x00000001  0x000000000    1901-12-13..1969-12-31
 * 0     0x000000000..0x07fffffff  0x000000000    1970-01-01..2038-01-19
 * 1     0x080000000..0x0ffffffff  0x100000000    2038-01-19..2106-02-07
 * 1     0x100000000..0x17fffffff  0x100000000    2106-02-07..2174-02-25
 * 2     0x180000000..0x1ffffffff  0x200000000    2174-02-25..2242-03-16
 * 2     0x200000000..0x27fffffff  0x200000000    2242-03-16..2310-04-04
 * 3     0x280000000..0x2ffffffff  0x300000000    2310-04-04..2378-04-22
 * 3     0x300000000..0x37fffffff  0x300000000    2378-04-22..2446-05-10
 *
 * Note that previous versions of the kernel on 64-bit systems would
 * incorrectly use extra epoch bits 1,1 for dates between 1901 and 1970.
 * e2fsck will correct this, assuming that it is run on the affected
 * filesystem before 2242.
 */


The only other question is whether you compile tested this on a 32-bit
system?  IIRC, the "sizeof(time->tv_sec)" check was to avoid compiler
warnings due to assigning values too large for the data type, and (to
a lesser extent) avoiding overhead on those systems.

If there is no 32-bit compile warning then I'm fine with this as-is,
since systems with 32-bit tv_sec are going to be broken at that point
in any case.

Cheers, Andreas






signature.asc (850 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

Theodore Ts'o
In reply to this post by David Howells
This is the patch I would prefer to use (and in fact which I have
added to the ext4 tree):

There are issues with 32-bit vs 64-bit encoding of times before
January 1, 1970, which are handled with this patch which is not
handled with what you have in your patch series.  So I'd prefer if you
drop this patch, and I'll get this sent to Linus as a bug fix for 4.4.

Cheers,

                                        - Ted


commit e0d738b05d484487b7e1e3c6d537da8bbef80c86
Author: David Turner <[hidden email]>
Date:   Tue Nov 24 14:34:37 2015 -0500

    ext4: Fix handling of extended tv_sec
    In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
    the {a,c,m}time fields, deferring the year 2038 problem to the year
    2446.
   
    When decoding these extended fields, for times whose bottom 32 bits
    would represent a negative number, sign extension causes the 64-bit
    extended timestamp to be negative as well, which is not what's
    intended.  This patch corrects that issue, so that the only negative
    {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
    timestamps).
   
    Some older kernels might have written pre-1970 dates with 1,1 in the
    extra bits.  This patch treats those incorrectly-encoded dates as
    pre-1970, instead of post-2311, until kernel 4.20 is released.
    Hopefully by then e2fsck will have fixed up the bad data.
   
    Also add a comment explaining the encoding of ext4's extra {a,c,m}time
    bits.
   
    Signed-off-by: David Turner <[hidden email]>
    Signed-off-by: Theodore Ts'o <[hidden email]>
    Reported-by: Mark Harris <[hidden email]>
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
    Cc: [hidden email]

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 750063f..fddce29 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -26,6 +26,7 @@
 #include <linux/seqlock.h>
 #include <linux/mutex.h>
 #include <linux/timer.h>
+#include <linux/version.h>
 #include <linux/wait.h>
 #include <linux/blockgroup_lock.h>
 #include <linux/percpu_counter.h>
@@ -727,19 +728,53 @@ struct move_extent {
  <= (EXT4_GOOD_OLD_INODE_SIZE + \
     (einode)->i_extra_isize)) \
 
+/*
+ * We use an encoding that preserves the times for extra epoch "00":
+ *
+ * extra  msb of                         adjust for signed
+ * epoch  32-bit                         32-bit tv_sec to
+ * bits   time    decoded 64-bit tv_sec  64-bit tv_sec      valid time range
+ * 0 0    1    -0x80000000..-0x00000001  0x000000000 1901-12-13..1969-12-31
+ * 0 0    0    0x000000000..0x07fffffff  0x000000000 1970-01-01..2038-01-19
+ * 0 1    1    0x080000000..0x0ffffffff  0x100000000 2038-01-19..2106-02-07
+ * 0 1    0    0x100000000..0x17fffffff  0x100000000 2106-02-07..2174-02-25
+ * 1 0    1    0x180000000..0x1ffffffff  0x200000000 2174-02-25..2242-03-16
+ * 1 0    0    0x200000000..0x27fffffff  0x200000000 2242-03-16..2310-04-04
+ * 1 1    1    0x280000000..0x2ffffffff  0x300000000 2310-04-04..2378-04-22
+ * 1 1    0    0x300000000..0x37fffffff  0x300000000 2378-04-22..2446-05-10
+ *
+ * Note that previous versions of the kernel on 64-bit systems would
+ * incorrectly use extra epoch bits 1,1 for dates between 1901 and
+ * 1970.  e2fsck will correct this, assuming that it is run on the
+ * affected filesystem before 2242.
+ */
+
 static inline __le32 ext4_encode_extra_time(struct timespec *time)
 {
-       return cpu_to_le32((sizeof(time->tv_sec) > 4 ?
-   (time->tv_sec >> 32) & EXT4_EPOCH_MASK : 0) |
-                          ((time->tv_nsec << EXT4_EPOCH_BITS) & EXT4_NSEC_MASK));
+ u32 extra = sizeof(time->tv_sec) > 4 ?
+ ((time->tv_sec - (s32)time->tv_sec) >> 32) & EXT4_EPOCH_MASK : 0;
+ return cpu_to_le32(extra | (time->tv_nsec << EXT4_EPOCH_BITS));
 }
 
 static inline void ext4_decode_extra_time(struct timespec *time, __le32 extra)
 {
-       if (sizeof(time->tv_sec) > 4)
-       time->tv_sec |= (__u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK)
-       << 32;
-       time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
+ if (unlikely(sizeof(time->tv_sec) > 4 &&
+ (extra & cpu_to_le32(EXT4_EPOCH_MASK)))) {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4,20,0)
+ /* Handle legacy encoding of pre-1970 dates with epoch
+ * bits 1,1.  We assume that by kernel version 4.20,
+ * everyone will have run fsck over the affected
+ * filesystems to correct the problem.
+ */
+ u64 extra_bits = le32_to_cpu(extra) & EXT4_EPOCH_MASK;
+ if (extra_bits == 3)
+ extra_bits = 0;
+ time->tv_sec += extra_bits << 32;
+#else
+ time->tv_sec += (u64)(le32_to_cpu(extra) & EXT4_EPOCH_MASK) << 32;
+#endif
+ }
+ time->tv_nsec = (le32_to_cpu(extra) & EXT4_NSEC_MASK) >> EXT4_EPOCH_BITS;
 }
 
 #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)       \
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 02/12] statx: Provide IOC flags for Windows fs attributes

Theodore Ts'o
In reply to this post by David Howells
On Fri, Nov 20, 2015 at 02:54:47PM +0000, David Howells wrote:
> Provide IOC flags for Windows fs attributes so that they can be retrieved (or
> even altered) using the FS_IOC_[GS]ETFLAGS ioctl and read using statx().

We have a real problem with numberspace used by FS_IOC_[GS]ETFLAGS.
This ioctl was originally defined for use with ext2, but then later on
other file systems decided they wanted to use the same ioctl for their
file system, and so now we have the situation where some of the bits
are used in a way where they are intended to be file system
independent, and some are file system specific.  And ext[234] use
these bits as its on-disk encoding --- which given that
FS_IOC[GS]ETFLAGS was originally ext[234] specific, is understandable,
but it makes things really messy at this point.

So for example, the bits you have chosen are in conflict with ext4's use:

#define EXT4_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
#define EXT4_PROJINHERIT_FL 0x20000000 /* Create with parents projid */

> +#define FS_HIDDEN_FL 0x10000000 /* Windows hidden file attribute */
> +#define FS_SYSTEM_FL 0x20000000 /* Windows system file attribute */
> +#define FS_ARCHIVE_FL 0x40000000 /* Windows archive file attribute */


As a result, I would suggest that we not try to use the
FS_IOC_[GS]ETFLAGS number scheme for any new interface, so we're at
least not making a bad situation worse.

The only reason why some other file systems have chosen to use
FS_IOC_[GS]ETFLAGS, instead of defining their own ioctl, is so they
can use lsattr/chattr from e2fsprogs instead of creating their own
utility.  But for statx, there isn't a good reason use the same flags
number space.  At the very least, can we use a new flags field for the
Windows file attributes?  It's not like lsattr/chattr has the ability
to set those flags today anyway.  So we might as well use a new flags
field and a new flags numberspace for them.

                                                - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

Arnd Bergmann
In reply to this post by Theodore Ts'o
On Tuesday 24 November 2015 14:36:46 Theodore Ts'o wrote:
> This is the patch I would prefer to use (and in fact which I have
> added to the ext4 tree):
>
> There are issues with 32-bit vs 64-bit encoding of times before
> January 1, 1970, which are handled with this patch which is not
> handled with what you have in your patch series.  So I'd prefer if you
> drop this patch, and I'll get this sent to Linus as a bug fix for 4.4.

I'm happy with either one. Apparently both Davids have arrived with
almost the same algorithm and implementation, with the exception of
the pre-1970 handling you mention there.

        Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 03/12] statx: Add a system call to make enhanced file info available

Dave Chinner
In reply to this post by David Howells
On Fri, Nov 20, 2015 at 02:54:57PM +0000, David Howells wrote:
> The defined bits in st_ioc_flags are the usual FS_xxx_FL, plus some extra
> flags that might be supplied by the filesystem.  Note that Ext4 returns
> flags outside of {EXT4,FS}_FL_USER_VISIBLE in response to FS_IOC_GETFLAGS.
> Should {EXT4,FS}_FL_USER_VISIBLE be extended to cover them?  Or should the
> extra flags be suppressed?

Quite frankly, we should not expose flags owned by a filesystem like
this. Create a new set of flagsi that are exposed by the syscall,
and every filesystem is responsible for translating their internal
flag values to the syscall flag values....

> The defined bits in the st_information field give local system data on a
> file, how it is accessed, where it is and what it does:
>
> STATX_INFO_ENCRYPTED File is encrypted
> STATX_INFO_TEMPORARY File is temporary (NTFS/CIFS/deleted)
> STATX_INFO_FABRICATED File was made up by filesystem
> STATX_INFO_KERNEL_API File is kernel API (eg: procfs/sysfs)
> STATX_INFO_REMOTE File is remote
> STATX_INFO_OFFLINE File is offline (CIFS)
> STATX_INFO_AUTOMOUNT Dir is automount trigger
> STATX_INFO_AUTODIR Dir provides unlisted automounts
> STATX_INFO_NONSYSTEM_OWNERSHIP File has non-system ownership details
> STATX_INFO_REPARSE_POINT File is reparse point (NTFS/CIFS)

        STATX_INFO_XATTR File/dir has extended attrs

... just like these STATX_INFO flags are filesystem independent...

And, FWIW, I'd like to see more than one local filesystem supported
in the initial patchset (e.g. btrfs) and also have all their
inode/fs flags exposed so we don't end up encoding weird
ext4-specific feature quirks into the API.....

Cheers,

Dave.
--
Dave Chinner
[hidden email]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

J. Bruce Fields
In reply to this post by David Howells
On Fri, Nov 20, 2015 at 04:28:35PM +0000, David Howells wrote:

> Martin Steigerwald <[hidden email]> wrote:
>
> > Any plans to add limitations of filesystem to the call like maximum file
> > size?  I know its mostly relevant for just for FAT32, but on any account
> > rather than trying to write 4 GiB and then file, it would be good to at some
> > time get a dialog at the beginning of the copy.
>
> Adding filesystem limits can be done.  I got a shopping list of things people
> wanted a while back and I've worked off of that list.  I can add other things
> - that's on of the reasons I left room for expansion.

I ran across systemd/src/basic/path-util.c:fd_is_mount_point() the other
day, and the contortions it goes through made me wonder if we should
also add mnt_id and/or an is_mountpoint boolean--it's annoying to have
to do name_to_handle_at() (not supported on all filesystems) just to get
mnt_id.

(Looking at it now I see it falls back on reading mount id from
/proc/self/fdinfo/<fd>.  Maybe that's good enough.  May depend on
whether there's a potential user that doesn't want to assume access to
/proc?)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Andreas Dilger-7
On Nov 25, 2015, at 10:51 AM, J. Bruce Fields <[hidden email]> wrote:

>
> On Fri, Nov 20, 2015 at 04:28:35PM +0000, David Howells wrote:
>> Martin Steigerwald <[hidden email]> wrote:
>>
>>> Any plans to add limitations of filesystem to the call like maximum file
>>> size?  I know its mostly relevant for just for FAT32, but on any account
>>> rather than trying to write 4 GiB and then file, it would be good to at some
>>> time get a dialog at the beginning of the copy.
>>
>> Adding filesystem limits can be done.  I got a shopping list of things people
>> wanted a while back and I've worked off of that list.  I can add other things
>> - that's on of the reasons I left room for expansion.
>
> I ran across systemd/src/basic/path-util.c:fd_is_mount_point() the other
> day, and the contortions it goes through made me wonder if we should
> also add mnt_id and/or an is_mountpoint boolean--it's annoying to have
> to do name_to_handle_at() (not supported on all filesystems) just to get
> mnt_id.
>
> (Looking at it now I see it falls back on reading mount id from
> /proc/self/fdinfo/<fd>.  Maybe that's good enough.  May depend on
> whether there's a potential user that doesn't want to assume access to
> /proc?)
IMHO, it should be possible to get information about a file or directory
from the file itself (i.e. statx() or fsinfo() on the path/fd), rather than
having to grub around in a /proc file that the application magically has to
know about, and parse text files there for every file being handled.

Cheers, Andreas






signature.asc (850 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

David Howells
In reply to this post by Christoph Hellwig
Christoph Hellwig <[hidden email]> wrote:

> from a quick look the statx bits looks fine in general.  I think Ted
> last time had a problem with the IOC flag allocation, so you might
> want to ping him.

Yeah - he commented on that.

> But fsinfo with the multiplexer and the void pointer is just horrible.
> What were you thinking there?

I think the fsinfo data struct probably wants splitting up.  Now this could be
done in a number of ways, including:

 (1) By adding multiple syscalls (statfsx, fsinfo, netfsinfo, ...) but each
     one needs a bit in the kernel to handle the basics (path/fd lookup,
     security check, buffer allocation and freeing, ...) which could all be in
     common - hence the mux method.

 (2) Adding an argument to the fsinfo syscall since it has at least one
     syscall argument spare.

 (3) Offloading some of the bits to standardised xattr calls.  The large
     string fields (domain name, volume name, ...) would seem to be obvious
     candidates for this.

Given that the core VFS gets to manage the contents of the buffer, it
shouldn't be as controversial as pioctl().

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

David Howells
In reply to this post by Theodore Ts'o
Theodore Ts'o <[hidden email]> wrote:

> This is the patch I would prefer to use (and in fact which I have
> added to the ext4 tree):
>
> There are issues with 32-bit vs 64-bit encoding of times before
> January 1, 1970, which are handled with this patch which is not
> handled with what you have in your patch series.  So I'd prefer if you
> drop this patch, and I'll get this sent to Linus as a bug fix for 4.4.

Fine by me.

Acked-by: David Howells <[hidden email]>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 02/12] statx: Provide IOC flags for Windows fs attributes

David Howells
In reply to this post by Theodore Ts'o
Theodore Ts'o <[hidden email]> wrote:

> As a result, I would suggest that we not try to use the
> FS_IOC_[GS]ETFLAGS number scheme for any new interface, so we're at
> least not making a bad situation worse.
>
> The only reason why some other file systems have chosen to use
> FS_IOC_[GS]ETFLAGS, instead of defining their own ioctl, is so they
> can use lsattr/chattr from e2fsprogs instead of creating their own
> utility.  But for statx, there isn't a good reason use the same flags
> number space.  At the very least, can we use a new flags field for the
> Windows file attributes?  It's not like lsattr/chattr has the ability
> to set those flags today anyway.  So we might as well use a new flags
> field and a new flags numberspace for them.

Hmmm...  I was trying to make it so that these bits would be saved to disk as
part of the IOC flags so that Samba could make use of them.  I guess they'll
have to be stored in an xattr instead.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 02/12] statx: Provide IOC flags for Windows fs attributes

David Howells
David Howells <[hidden email]> wrote:

> > As a result, I would suggest that we not try to use the
> > FS_IOC_[GS]ETFLAGS number scheme for any new interface, so we're at
> > least not making a bad situation worse.
> >
> > The only reason why some other file systems have chosen to use
> > FS_IOC_[GS]ETFLAGS, instead of defining their own ioctl, is so they
> > can use lsattr/chattr from e2fsprogs instead of creating their own
> > utility.  But for statx, there isn't a good reason use the same flags
> > number space.  At the very least, can we use a new flags field for the
> > Windows file attributes?  It's not like lsattr/chattr has the ability
> > to set those flags today anyway.  So we might as well use a new flags
> > field and a new flags numberspace for them.
>
> Hmmm...  I was trying to make it so that these bits would be saved to disk as
> part of the IOC flags so that Samba could make use of them.  I guess they'll
> have to be stored in an xattr instead.

Or as Dave Chinner suggested, I can put them elsewhere and let the FS deal
with them in its own way.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [RFC][PATCH 00/12] Enhanced file stat system call

Andreas Dilger-7
In reply to this post by David Howells
On Nov 26, 2015, at 8:19 AM, David Howells <[hidden email]> wrote:

>
> Christoph Hellwig <[hidden email]> wrote:
>
>> from a quick look the statx bits looks fine in general.  I think Ted
>> last time had a problem with the IOC flag allocation, so you might
>> want to ping him.
>
> Yeah - he commented on that.
>
>> But fsinfo with the multiplexer and the void pointer is just horrible.
>> What were you thinking there?
>
> I think the fsinfo data struct probably wants splitting up.
Could we separate the statx() and fsinfo() submissions so that this doesn't
block statx() landing indefinitely?  I think people are generally in support
of statx() as it is today, and it's been _sooo_ long in coming that I'd hate
to see it delayed further.  The statx() syscall definitely has value without
fsinfo() to improve the life of network filesystems.

Cheers, Andreas

>  Now this could be
> done in a number of ways, including:
>
> (1) By adding multiple syscalls (statfsx, fsinfo, netfsinfo, ...) but each
>     one needs a bit in the kernel to handle the basics (path/fd lookup,
>     security check, buffer allocation and freeing, ...) which could all be in
>     common - hence the mux method.
>
> (2) Adding an argument to the fsinfo syscall since it has at least one
>     syscall argument spare.
>
> (3) Offloading some of the bits to standardised xattr calls.  The large
>     string fields (domain name, volume name, ...) would seem to be obvious
>     candidates for this.
>
> Given that the core VFS gets to manage the contents of the buffer, it
> shouldn't be as controversial as pioctl().
>
> David
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [hidden email]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers, Andreas






signature.asc (850 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 02/12] statx: Provide IOC flags for Windows fs attributes

Andreas Dilger-7
In reply to this post by David Howells
On Nov 26, 2015, at 8:35 AM, David Howells <[hidden email]> wrote:

>
> Theodore Ts'o <[hidden email]> wrote:
>
>> As a result, I would suggest that we not try to use the
>> FS_IOC_[GS]ETFLAGS number scheme for any new interface, so we're at
>> least not making a bad situation worse.
>>
>> The only reason why some other file systems have chosen to use
>> FS_IOC_[GS]ETFLAGS, instead of defining their own ioctl, is so they
>> can use lsattr/chattr from e2fsprogs instead of creating their own
>> utility.  But for statx, there isn't a good reason use the same flags
>> number space.  At the very least, can we use a new flags field for the
>> Windows file attributes?  It's not like lsattr/chattr has the ability
>> to set those flags today anyway.  So we might as well use a new flags
>> field and a new flags numberspace for them.
>
> Hmmm...  I was trying to make it so that these bits would be saved to disk as
> part of the IOC flags so that Samba could make use of them.  I guess they'll
> have to be stored in an xattr instead.
The flags can be part of the same flags word in the statx() API, and how they
are stored on disk is a filesystem implementation detail.  In some cases, not
all of the flags can be stored on disk (e.g. FAT or whatever) and an error
will be returned.  In other cases they can be stored efficiently as bits in
the inode, and other filesystems may chose to store them as internal xattrs.
That shouldn't be the concern of the statx() syscall.

Cheers, Andreas






signature.asc (850 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

Theodore Ts'o
In reply to this post by Arnd Bergmann
On Tue, Nov 24, 2015 at 09:10:53PM +0100, Arnd Bergmann wrote:

> On Tuesday 24 November 2015 14:36:46 Theodore Ts'o wrote:
> > This is the patch I would prefer to use (and in fact which I have
> > added to the ext4 tree):
> >
> > There are issues with 32-bit vs 64-bit encoding of times before
> > January 1, 1970, which are handled with this patch which is not
> > handled with what you have in your patch series.  So I'd prefer if you
> > drop this patch, and I'll get this sent to Linus as a bug fix for 4.4.
>
> I'm happy with either one. Apparently both Davids have arrived with
> almost the same algorithm and implementation, with the exception of
> the pre-1970 handling you mention there.

I was doing some testing on x86, which leads me to ask --- what's the
current thinking about post y2038 on 32-bit platforms such as x86?  I
see that there was some talk about using struct timespec64, but we
haven't made the transition in the VFS interfaces yet, despite a
comment in an LWN article from 2014 stating that "the first steps have
been taken; hopefully the rest will follow before too long".

Cheers,

                                        - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH 01/12] Ext4: Fix extended timestamp encoding and decoding

Arnd Bergmann
On Saturday 28 November 2015 21:45:55 Theodore Ts'o wrote:

> On Tue, Nov 24, 2015 at 09:10:53PM +0100, Arnd Bergmann wrote:
> > On Tuesday 24 November 2015 14:36:46 Theodore Ts'o wrote:
> > > This is the patch I would prefer to use (and in fact which I have
> > > added to the ext4 tree):
> > >
> > > There are issues with 32-bit vs 64-bit encoding of times before
> > > January 1, 1970, which are handled with this patch which is not
> > > handled with what you have in your patch series.  So I'd prefer if you
> > > drop this patch, and I'll get this sent to Linus as a bug fix for 4.4.
> >
> > I'm happy with either one. Apparently both Davids have arrived with
> > almost the same algorithm and implementation, with the exception of
> > the pre-1970 handling you mention there.
>
> I was doing some testing on x86, which leads me to ask --- what's the
> current thinking about post y2038 on 32-bit platforms such as x86?  I
> see that there was some talk about using struct timespec64, but we
> haven't made the transition in the VFS interfaces yet, despite a
> comment in an LWN article from 2014 stating that "the first steps have
> been taken; hopefully the rest will follow before too long".

The approach in my initial VFS series was to introduce 'struct inode_time',
but I have basically abandoned that idea now, after we decided to introduce
'timespec64' inside of the kernel and use that for other subsystems.
The rought plan is now to have separate time64_t and u32 seconds/nanoseconds
values in 'struct inode', 'struct iattr' and 'struct kstat' and use
inline functions or macros to extract or set them as time64_t or timespec64
in file system code, but that code is not written yet.

I'm mostly coordinating the y2038 work at the moment, but that means that
a lot of the work is going into individual drivers that a single person
can easily handle. We've had a couple of people who tried looking at VFS,
but none of them followed through, so it got delayed a bit. However,
Deepa Dinamani is now looking y2038 for VFS and individual file systems
as part of her Outreachy internship and I'm optimistic that we'll soon
be making progress again here with her work.

The other large missing piece is the system call implementation. I have
posted a series earlier this year before my parental leave, and it's
currently lacking review from libc folks, and blocked on me to update
the series and post it again.

        Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [hidden email]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
123