In computing, a file server (or fileserver) is a computer attached to a network that provides a location for shared disk access, i.e. shared storage of computer files (such as documents, sound files, photographs, movies, images, databases, etc.) that can be accessed by the workstations that are able to reach the computer that shares the access through a computer network. The term server highlights the role of the machine in the client–server scheme, where the clients are the workstations using the storage. It is common that a file server does not perform computational tasks, and does not run programs on behalf of its clients. It is designed primarily to enable the storage and retrieval of data while the computation is carried out by the workstations.
File servers are commonly found in schools and offices, where users use a LAN to connect their client computers.
File servers may also be categorized by the method of access: Internet file servers are frequently accessed by File Transfer Protocol (FTP) or by HTTP (but are different from web servers, that often provide dynamic web content in addition to static files). Servers on a LAN are usually accessed by SMB/CIFS protocol (Windows and Unix-like) or NFS protocol (Unix-like systems).
Design of file servers:
In modern businesses the design of file servers is complicated by competing demands for storage space, access speed, recoverability, ease of administration, security, and budget.
The primary piece of hardware equipment for servers over the last couple of decades has proven to be the hard disk drive. Although other forms of storage are viable (such as magnetic tape and solid-state drives) disk drives have continued to offer the best fit for cost, performance, and capacity.
Since the crucial function of a file server is storage, technology has been developed to operate multiple disk drives together as a team, forming a disk array. A disk array typically has cache (temporary memory storage that is faster than the magnetic disks), as well as advanced functions like RAID and storage virtualization. Typically disk arrays increase level of availability by using redundant components other than RAID, such as power supplies. Disk arrays may be consolidated or virtualized in a SAN
2. Network Attached Storage (NAS):
Network-attached storage (NAS) is file-level computer data storage connected to a computer network providing data access to a heterogeneous group of clients. NAS devices specifically are distinguished from file servers generally in a NAS being a computer appliance – a specialized computer built from the ground up for serving files – rather than a general purpose computer being used for serving files (possibly with other functions). In discussions of NASs, the term “file server” generally stands for a contrasting term, referring to general purpose computers only.
As of 2010 NAS devices are gaining popularity, offering a convenient method for sharing files between multiple computers. Potential benefits of network-attached storage, compared to non-dedicated file servers, include faster data access, easier administration, and simple configuration.
NAS systems are networked appliances containing one or more hard drives, often arranged into logical, redundant storage containers or RAID arrays. Network Attached Storage removes the responsibility of file serving from other servers on the network. They typically provide access to files using network file sharing protocols such as NFS, SMB/CIFS (Server Message Block/Common Internet File System), or AFP.
A. RAID (redundant array of independent disks) is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both. Data is distributed across the drives in one of several ways, referred to as RAID levels, depending on the required level of redundancy and performance. The different schemes, or data distribution layouts, are named by the word RAID followed by a number, for example RAID 0 or RAID 1. Each schema, or RAID level, provides a different balance among the key goals: reliability, availability, performance, and capacity. RAID levels greater than RAID 0 provide protection against unrecoverable sector read errors, as well as against failures of whole physical drives.
RAID Standard levels:
· RAID 0 consists of striping, without mirroring or parity. The capacity of a RAID 0 volume is the sum of the capacities of the disks in the set, the same as with a spanned volume. There is no added redundancy for handling disk failures, just as with a spanned volume. Thus, failure of one disk causes the loss of the entire RAID 0 volume, with reduced possibilities of data recovery when compared with a broken spanned volume. Striping distributes the contents of files roughly equally among all disks in the set, which makes concurrent read or write operations on the multiple disks almost inevitable and results in performance improvements. The concurrent operations make the throughput of most read and write operations equal to the throughput of one disk multiplied by the number of disks. Increased throughput is the big benefit of RAID 0 versus spanned volume, at the cost of increased vulnerability to drive failures.
· RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two drives, thereby producing a “mirrored set” of drives. Thus, any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID 1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive limits the write performance. The array continues to operate as long as at least one drive is functioning.
· RAID 2 consists of bit-level striping with dedicated Hamming-code parity. All disk spindle rotation is synchronized and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This level is of historical significance only; although it was used on some early machines (for example, the Thinking Machines CM-2), as of 2014 it is not used by any commercially available system.
· RAID 3 consists of byte-level striping with dedicated parity. All disk spindle rotation is synchronized and data is striped such that each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive. Although implementations exist, RAID 3 is not commonly used in practice.
· RAID 4 consists of block-level striping with dedicated parity. This level was previously used by NetApp, but has now been largely replaced by a proprietary implementation of RAID 4 with two parity disks, called RAID-DP. The main advantage of RAID 4 over RAID 2 and 3 is I/O parallelism: in RAID 2 and 3, a single read/write I/O operation requires reading the whole group of data drives, while in RAID 4 one I/O read/write operation does not have to spread across all data drives. As a result, more I/O operations can be executed in parallel, improving the performance of small transfers.
· RAID 5 consists of block-level striping with distributed parity. Unlike RAID 4, parity information is distributed among the drives, requiring all drives but one to be present to operate. Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks. RAID 5 implementations are susceptible to system failures because of trends regarding array rebuild time and the chance of drive failure during rebuild (see “Increasing rebuild time and failure probability” section, below). Rebuilding an array requires reading all data from all disks, opening a chance for a second drive failure and the loss of the entire array. In August 2012, Dell posted an advisory against the use of RAID 5 in any configuration on Dell EqualLogic arrays and RAID 50 with “Class 2 7200 RPM drives of 1 TB and higher capacity” for business-critical data.
· RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced. With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5. RAID 10 also minimizes these problems.
B. Nested (hybrid) RAID:
In what was originally termed hybrid RAID, many storage controllers allow RAID levels to be nested. The elements of a RAID may be either individual drives or arrays themselves. Arrays are rarely nested more than one level deep.
The final array is known as the top array. When the top array is RAID 0 (such as in RAID 1+0 and RAID 5+0), most vendors omit the “+” (yielding RAID 10 and RAID 50, respectively).
· RAID 0+1 creates two stripes and mirrors them. If a single drive failure occurs then one of the stripes has failed, at this point you are running effectively as RAID 0 with no redundancy, significantly higher risk is introduced during a rebuild than RAID 1+0 as all the data from all the drives in the remaining stripe has to be read rather than just from 1 drive increasing the chance of an unrecoverable read error (URE) and significantly extending the rebuild window.
· RAID 1+0 creates a striped set from a series of mirrored drives. The array can sustain multiple drive losses so long as no mirror loses all its drives.
· JBOD RAID N+N With JBOD (Just a Bunch Of Disks), it is possible to concatenate disks, but also volumes such as RAID sets. With larger drive capacities, write and rebuilding time may increase dramatically (especially, as described above, with RAID 5 and RAID 6). By splitting larger RAID sets into smaller subsets and concatenating them with JBOD, write and rebuilding time may be reduced. If a hardware RAID controller is not capable of nesting JBOD with RAID, then JBOD can be achieved with software RAID in combination with RAID set volumes offered by the hardware RAID controller. There is another advantage in the form of disaster recovery, if a small RAID subset fails, then the data on the other RAID subsets is not lost, reducing restore time.
What is Spanned Volume?:
When talking of SPanned Volume we are brought to the topic of Non-RAID drive architectures.
C. Non-RAID drive architectures:
The most widespread standard for configuring multiple hard disk drives is RAID (Redundant Array of Inexpensive/Independent Disks), which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with similarity to RAID.
- JBOD (derived from “just a bunch of disks“): described multiple hard disk drives operated as individual independent hard disk drives. JBOD (abbreviated from “just a bunch of disks/drives“) is an architecture using multiple hard drives exposed as individual devices. Hard drives may be treated independently or may be combined into a one or more logical volumes using a volume manager like LVM or mdadm; such volumes are usually called “spanned” or “linear | SPAN | BIG”. A spanned volume provides no redundancy, so failure of a single hard drive amounts to failure of the whole logical volume. Redundancy for resilience and/or bandwidth improvement may be provided, in software, at a higher level.
- SPAN or BIG: A method of combining the free space on multiple hard disk drives from “JBoD” to create a spanned volume. Such a concatenation is sometimes also called BIG/SPAN. A SPAN or BIG is generally a spanned volume only, as it often contains mismatched types and sizes of hard disk drives. Concatenation or spanning of drives is not one of the numbered RAID levels, but it is a popular method for combining multiple physical disk drives into a single logical disk. It provides no data redundancy. Drives are merely concatenated together, end to beginning, so they appear to be a single large disk. It may be referred to as SPAN or BIG (meaning just the words “span” or “big”, not as acronyms). What makes a SPAN or BIG different from RAID configurations is the possibility for the selection of drives. While RAID usually requires all drives to be of similar capacity[a] and it is preferred that the same or similar drive models are used for performance reasons, a spanned volume does not have such requirements.
- MAID (derived from “massive array of idle drives“): an architecture using hundreds to thousands of hard disk drives for providing nearline storage of data, primarily designed for “Write Once, Read Occasionally” (WORO) applications, in which increased storage density and decreased cost are traded for increased latency and decreased redundancy.
Network-attached storage removes the responsibility of file serving from other servers on the network. They typically provide access to files using network file sharing protocols such as NFS, SMB/CIFS, or AFP. From the mid-1990s, NAS devices began gaining popularity as a convenient method of sharing files among multiple computers. Potential benefits of dedicated network-attached storage, compared to general-purpose servers also serving files, include faster data access, easier administration, and simple configuration.
A NAS unit is a computer connected to a network that provides only file-based data storage services to other devices on the network. Although it may technically be possible to run other software on a NAS unit, it is usually not designed to be a general-purpose server. For example, NAS units usually do not have a keyboard or display, and are controlled and configured over the network, often using a browser.
A full-featured operating system is not needed on a NAS device, so often a stripped-down operating system is used. For example, FreeNAS or NAS4Free, both open source NAS solutions designed for commodity PC hardware, are implemented as a stripped-down version of FreeBSD.
NAS systems contain one or more hard disk drives, often arranged into logical, redundant storage containers or RAID.
NAS uses file-based protocols such as NFS (popular on UNIX systems), SMB/CIFS (Server Message Block/Common Internet File System) (used with MS Windows systems), AFP (used with Apple Macintosh computers), or NCP (used with OES and Novell NetWare). NAS units rarely limit clients to a single protocol.
The key difference between direct-attached storage (DAS) and NAS is that DAS is simply an extension to an existing server and is not necessarily networked. NAS is designed as an easy and self-contained solution for sharing files over the network.
Both DAS and NAS can potentially increase availability of data by using RAID or clustering.
When both are served over the network, NAS could have better performance than DAS, because the NAS device can be tuned precisely for file serving which is less likely to happen on a server responsible for other processing. Both NAS and DAS can have various amount of cache memory, which greatly affects performance. When comparing use of NAS with use of local (non-networked) DAS, the performance of NAS depends mainly on the speed of and congestion on the network.
NAS is generally not as customizable in terms of hardware (CPU, memory, storage components) or software (extensions, plug-ins, additional protocols) as a general-purpose server supplied with DAS.
NAS provides both storage and a file system. This is often contrasted with SAN (Storage Area Network), which provides only block-based storage and leaves file system concerns on the “client” side. SAN protocols include Fibre Channel, iSCSI, ATA over Ethernet (AoE) and HyperSCSI.
One way to loosely conceptualize the difference between a NAS and a SAN is that NAS appears to the client OS (operating system) as a file server (the client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, visible in disk and volume management utilities (along with client’s local disks), and available to be formatted with a file system and mounted.
Despite their differences, SAN and NAS are not mutually exclusive, and may be combined as a SAN-NAS hybrid, offering both file-level protocols (NAS) and block-level protocols (SAN) from the same system. An example of this is Openfiler, a free software product running on Linux-based systems. A shared disk file system can also be run on top of a SAN to provide filesystem service.
NAS is useful for more than just general centralized storage provided to client computers in environments with large amounts of data. NAS can enable simpler and lower cost systems such as load-balancing and fault-tolerant email and web server systems by providing storage services. The potential emerging market for NAS is the consumer market where there is a large amount of multi-media data. Such consumer market appliances are now commonly available. Unlike their rackmounted counterparts, they are generally packaged in smaller form factors. The price of NAS appliances has plummeted in recent years, offering flexible network-based storage to the home consumer market for little more than the cost of a regular USB or FireWire external hard disk. Many of these home consumer devices are built around ARM, PowerPC or MIPS processors running an embedded Linux operating system.
A clustered NAS is a NAS that is using a distributed file system running simultaneously on multiple servers. The key difference between a clustered and traditional NAS is the ability to distribute (e.g. stripe) data and metadata across the cluster nodes or storage devices. Clustered NAS, like a traditional one, still provides unified access to the files from any of the cluster nodes, unrelated to the actual location of the data.
File servers generally offer some form of system security to limit access to files to specific users or groups. In large organizations, this is a task usually delegated to what is known as directory services such as openLDAP, Novell’s eDirectory or Microsoft’s Active Directory.
These servers work within the hierarchical computing environment which treat users, computers, applications and files as distinct but related entities on the network and grant access based on user or group credentials. In many cases, the directory service spans many file servers, potentially hundreds for large organizations. In the past, and in smaller organizations, authentication could take place directly at the server itself.
File and Storage Services:
File and Storage Services includes technologies that help you set up and manage one or more file servers, which are servers that provide central locations on your network where you can store files and share them with users. If your users need access to the same files and applications, or if centralized backup and file management are important to your organization, you should set up one or more servers as a file server by installing the File and Storage Services role and the appropriate role services.
The File and Storage Services role and the Storage Services role service are installed by default, but without any additional role services. This basic functionality enables you to use Server Manager or Windows PowerShell to manage the storage functionality of your servers. However, to set up or manage a file server, you should use the Add Roles and Features Wizard in Server Manager or the Install-WindowsFeature Windows PowerShell cmdlet to install additional File and Storage Services role services, such as the role services discussed in this topic.
Administrators can use the File and Storage Services role to set up and manage multiple file servers and their storage capabilities by using Server Manager or Windows PowerShell. Some of the specific applications include the following:
Storage Spaces – Use to deploy high availability storage that is resilient and scalable by using cost-effective industry-standard disks.
Folder Redirection, Offline Files, and Roaming User Profiles – Use to redirect the path of local folders (such as the Documents folder) or an entire user profile to a network location, while caching the contents locally for increased speed and availability.
Work Folders – Use to enable users to store and access work files on personal PCs and devices, in addition to corporate PCs. Users gain a convenient location to store work files and access them from anywhere. Organizations maintain control over corporate data by storing the files on centrally managed file servers and optionally specifying user device policies (such as encryption and lock screen passwords). Work Folders is a new role service in Windows Server 2012 R2.
Data Deduplication – Use to reduce the disk space requirements of your files, saving money on storage.
iSCSI Target Server – Use to create centralized, software-based, and hardware-independent iSCSI disk subsystems in storage area networks (SANs).
Work Folders – Provides a consistent way for users to access their work files from their personal computers and devices. See Work Folders for more information.
Server Message Block – Enhancements include automatic rebalancing of Scale-Out File Server clients, improved performance of SMB Direct, and improved SMB event messages. See What’s New in SMB for more information.
Storage Spaces – Enhancements include SSD and HDD storage tiers, an SSD-based write-back cache, parity space support for failover clusters, dual parity support, and greatly decreased storage space rebuild times. See What’s New in Storage Spaces for more information.
DFS Replication – Enhancements include database cloning for large performance gains during initial sync, a Windows PowerShell module for DFS Replication, a new DFS Replication WMI provider, faster replication on high bandwidth connections, conflict and preexisting data recovery, and support for rebuilding corrupt databases without unexpected data loss. See What’s New in DFS Replication and DFS Namespaces for more information.
iSCSI Target Server – Updates include virtual disk enhancements, manageability enhancements in a hosted or private cloud, and improved optimization to allow disk-level caching. See What’s New in iSCSI Target Server for more information. Provides block storage to other servers and applications on the network by using the Internet SCSI (iSCSI) standard.
Data Deduplication – Saves disk space by storing a single copy of identical data on the volume.
Storage Spaces and storage pools – Enables you to virtualize storage by grouping industry-standard disks into storage pools and then creating storage spaces from the available capacity in the storage pools.
Unified remote management of File and Storage Services in Server Manager – Enables you to remotely manage multiple file servers, including their role services and storage, from a single window.
Non-RAID Drive Architectures (Concatenation – SPAN and BIG)