How Ext4 Extents Work?
Earlier Ext2 and Ext3 had the limitation on the size of the file. It used 32 bit block number to access the data blocks. So that limited the maximum size of file to be 2^32 * blocksize(eg. 4k**) = 16TB*. Also the access time for large Files were slow because in had to go through lots of indirection.Ext4 Filesystem can support very large files it has 48 bits to adress a block. Also its uses extents to store data so access time is faster for large files.
The information for the data blocks are stored in the i_data of the inode structure. In a system without Extents, the first 12 entries contains the block numbers of the first 12 blocks of data. Then
it contains the block number for the Indirect blocks. That block contains the array of block numbers which point to the data. Similarly, there is Double indirect block and triple indirect block. So if we need to get the data from a very large file, we need to go through those indirection.
How to determine if the Filesystem uses Extents or indirect Mapping?
To determine whether the inode has extent based mapping or indirect mapping. We need to look at the EXT2_EXTENTS_FL bit in the i_flags in inode structure. The root directory always has the indirect mapping instead of the block mapping.
What does the Ext4 Extents Data Structure Look like and where it is stored?
In extenxt based block mapping, the i_data of inode contains Extent structures. There is a extent header, Extent and Extent index. The following structures define those structures.
/*
* This is the extent on-disk structure.
* It's used at the bottom of the tree.
*/
struct ext4_extent {
uint32_t ee_block; /* first logical block extent covers */
uint16_t ee_len; /* number of blocks covered by extent */
uint16_t ee_start_hi; /* high 16 bits of physical block */
uint32_t ee_start_lo; /* low 32 bits of physical block */
};
/*
* This is index on-disk structure.
* It's used at all the levels except the bottom.
*/
typedef struct ext4_extent_idx {
uint32_t ei_block; /* index covers logical blocks from 'block' */
uint32_t ei_leaf_lo; /* pointer to the physical block of the next *
* level. leaf or next index could be there */
uint16_t ei_leaf_hi; /* high 16 bits of physical block */
uint16_t ei_unused;
};
/*
* Each block (leaves and indexes), even inode-stored has header.
*/
typedef struct ext4_extent_header {
uint16_t eh_magic; /* probably will support different formats */
uint16_t eh_entries; /* number of valid entries */
uint16_t eh_max; /* capacity of store in entries */
uint16_t eh_depth; /* has tree real underlying blocks? */
uint32_t eh_generation; /* generation of the tree */
};
(NOTE: They are stored as little endean and Linux code has __le32 types used)
The Extent is implemented as a B+ Tree. Only the leaf nodes has the Extent Structure. Others have Extent Index Structure. The extent information starts with the header and then either Extent or Extent Index.In the i_data there is only space for header and 3 more Extent structures. If more extent information needs to be stored, than whole bocks with extent index structures are used. In the non leaf nodes, extent index structures are used. It contains the block number of the block where next level of nodes are stored.
The logic can be described by the following 2 figures.
(Please click the images to see full image)
Earlier Ext2 and Ext3 had the limitation on the size of the file. It used 32 bit block number to access the data blocks. So that limited the maximum size of file to be 2^32 * blocksize(eg. 4k**) = 16TB*. Also the access time for large Files were slow because in had to go through lots of indirection.Ext4 Filesystem can support very large files it has 48 bits to adress a block. Also its uses extents to store data so access time is faster for large files.
The information for the data blocks are stored in the i_data of the inode structure. In a system without Extents, the first 12 entries contains the block numbers of the first 12 blocks of data. Then
it contains the block number for the Indirect blocks. That block contains the array of block numbers which point to the data. Similarly, there is Double indirect block and triple indirect block. So if we need to get the data from a very large file, we need to go through those indirection.
How to determine if the Filesystem uses Extents or indirect Mapping?
To determine whether the inode has extent based mapping or indirect mapping. We need to look at the EXT2_EXTENTS_FL bit in the i_flags in inode structure. The root directory always has the indirect mapping instead of the block mapping.
What does the Ext4 Extents Data Structure Look like and where it is stored?
In extenxt based block mapping, the i_data of inode contains Extent structures. There is a extent header, Extent and Extent index. The following structures define those structures.
/*
* This is the extent on-disk structure.
* It's used at the bottom of the tree.
*/
struct ext4_extent {
uint32_t ee_block; /* first logical block extent covers */
uint16_t ee_len; /* number of blocks covered by extent */
uint16_t ee_start_hi; /* high 16 bits of physical block */
uint32_t ee_start_lo; /* low 32 bits of physical block */
};
/*
* This is index on-disk structure.
* It's used at all the levels except the bottom.
*/
typedef struct ext4_extent_idx {
uint32_t ei_block; /* index covers logical blocks from 'block' */
uint32_t ei_leaf_lo; /* pointer to the physical block of the next *
* level. leaf or next index could be there */
uint16_t ei_leaf_hi; /* high 16 bits of physical block */
uint16_t ei_unused;
};
/*
* Each block (leaves and indexes), even inode-stored has header.
*/
typedef struct ext4_extent_header {
uint16_t eh_magic; /* probably will support different formats */
uint16_t eh_entries; /* number of valid entries */
uint16_t eh_max; /* capacity of store in entries */
uint16_t eh_depth; /* has tree real underlying blocks? */
uint32_t eh_generation; /* generation of the tree */
};
(NOTE: They are stored as little endean and Linux code has __le32 types used)
The Extent is implemented as a B+ Tree. Only the leaf nodes has the Extent Structure. Others have Extent Index Structure. The extent information starts with the header and then either Extent or Extent Index.In the i_data there is only space for header and 3 more Extent structures. If more extent information needs to be stored, than whole bocks with extent index structures are used. In the non leaf nodes, extent index structures are used. It contains the block number of the block where next level of nodes are stored.
The logic can be described by the following 2 figures.
(Please click the images to see full image)
The eh_magic field in the header is always 0xf30a (little endean).
The eh_entries field determines the no of valid extents in that extent array.
The eh_depth is the depth of the B+ tree. if the depth is 0, the extents in that array are always extent structures and not the extent index structures.
I have implemented this feature in ext2read project (http://ext2read.sf.net).It hasn't been tested yet though for very large files. The Ext4 also uses HTree for directory entries. I am currently studying it and implementing.
Please let me know if you have any questions.
Manish Regmi (regmi dot manish at gmail dot com)
The eh_entries field determines the no of valid extents in that extent array.
The eh_depth is the depth of the B+ tree. if the depth is 0, the extents in that array are always extent structures and not the extent index structures.
I have implemented this feature in ext2read project (http://ext2read.sf.net).It hasn't been tested yet though for very large files. The Ext4 also uses HTree for directory entries. I am currently studying it and implementing.
Please let me know if you have any questions.
Manish Regmi (regmi dot manish at gmail dot com)
Thank you for this article, it really helped me. I indeed didn't found a lot of documentation on ext4 extents.
ReplyDeleteThanks a lot, you saved my day
ReplyDeleteHi everyone,
ReplyDeleteI have vmdk file having ext4 file system.
I want to develop program which extract files from this VMDK file.
How would I extract superblock information & other information programaticaly.
Thanks in advance