PC specification for Mascot Server

Introduction

Any recent, high specification PC containing either Intel or AMD processor(s) should make a suitable platform for Mascot. If you are buying a new PC, then a dual processor system, or one which can be upgraded to dual processors, will be a good investment. Systems with more than two processors usually carry a substantial price premium. If you plan to do high throughput work, and need to run Mascot on more than two processors, a cluster of dual processor boxes will usually offer the most cost effective solution.

If you don’t have time to read the whole of this document, then choose a system with the highest speed quad core processor(s) that are available, at least 4GB of RAM (preferably 8Gb), a 64 Bit operating system, and the largest SATA hard disks available (at least 500Gb). For a 2 CPU Mascot license, you will need a computer with 2 quad core processors.

This presentation, given at our ASMS Workshop and User Meeting in 2010 contains additional information.

Processor (CPU)

The two main PC processor manufacturers are Intel and AMD. Matrix Science does not currently support Mascot on other PC processors such as Itanium.

We have observed excellent scalability with dual processor systems running under Microsoft Windows and Linux. That is, throughput from a dual processor system comes very close to double that obtained from a single processor. However, we cannot predict or guarantee the scalability of Mascot on hardware configurations that have not been specifically tested.

Processor Speed

The main factors affecting Mascot performance are processor clock speed and number of cores.

It is not possible to compare processor speeds directly for different architectures. For example, a system with a 1.8GHz AMD Opteron will run a Mascot search in about the same time as another system with a 3.2GHz Intel Xeon processor. For any given processor type, the search speed will be to be proportional to processor speed unless:

  • Disk access becomes a bottleneck, possibly because the FASTA sequence database has to be read into memory from disk (see section on RAM below) or
  • The processor cache is too small and causes a bottleneck between processor and memory.

Multiple Cores

Intel and AMD both released their first dual core processors in early 2005. Most Mascot searches scale linearly with the number of cores.

Mascot 2.3 supports up to 4 cores per licence. For example, to use all cores on a system with:

  • 1 x quad core processor requires a single processor Mascot licence
  • 1 x eight core processor requires a dual processor Mascot licence
  • 2 x dual core processors requires a single processor Mascot licence
  • 2 x 6-core processors requires a 3 processor Mascot licence

For versions of Mascot prior to Mascot 2.3 on systems with dual or quad core processors, Mascot is licensed on a “per socket” basis. A system with 2 x dual core processors, for example, requires a 2 cpu Mascot license to use all 4 cores.

Mascot 2.2 is configured so that it will not run on a system with more than 4 cores per physical cpu. So, six core processors, for example, are not supported with Mascot 2.2. Support for processors with more than 4 cores was introduced in Mascot 2.3. For Nehalem quad core processors, version 2.2.06 or later (patch available here, is required.

Mascot 2.0 and later for Windows and Linux has full support for dual core Intel processors. For example, a single processor Mascot license will use both cores on a dual core Intel Pentium D. In this case, the number of threads should be set to 2 using the database maintenance utility.

Mascot 2.1 and later for Windows and Linux has full support for quad core Intel processors. The number of threads should be set to 4 times the number of cpus licensed using the database maintenance utility.

Mascot 2.1.02 and later for Windows and Linux has full support for Dual core AMD Opteron processors . As with the Intel processors, the number of threads should be set to 2 times the number of cpus licensed using the database maintenance utility.

Mascot 2.2 and later for Windows and Linux has full support for quad core AMD processors. The number of threads should be set to 4 times the number of cpus licensed using the database maintenance utility.

On a cluster system, Mascot 2.1 or later (Windows or Linux) is required to make full use of either AMD or Intel dual or quad core nodes. The nodelist.txt file should specify the number of physical cpus (sockets). Mascot will automatically create the correct number of threads on the node.

64 Bit

Many of the recent Intel and AMD processors are “64 bit” or, in Intel terms, have “Intel EM64T” technology. 32 bit applications can also run on these processors. All versions of Mascot will run on 64 bit Linux, but Mascot 2.2 or later is required for 64 bit Windows. For earlier versions of Mascot, you must install standard 32 bit Windows.

There is negligible performance difference between 32 and 64 bit Mascot. The main advantage in using a 64 bit system is that it is possible to install and access more memory. Installing more memory may help when multiple searches are being run at the same time and it will also enable extremely large reports to be loaded without running out of memory. Mascot searches are split into ‘chunks’ and therefore all searches can be run on a 32 bit system without running out of address space.

To take advantage of 64 bit Windows, the 64 bit release of Active State Perl (provided on the Mascot CD) must be installed. Mascot 2.2 includes 64 bit Parser (for the reports) but only 32 bit binaries for the search engine. Mascot 2.3 includes 32 and 64 bit binaries.

Mascot 2.2 and later for Linux includes both 32 and 64 bit binaries. Support for 32 bit Linux will be dropped in Mascot 2.5

Hyper-Threading Technology

This is available on Intel processors, but not AMD. Hyper-Threading works by duplicating certain sections of the processor – those that store the architectural state – but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two “logical” processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously.

When HT is enabled, 2 logical “processors” per physical processor will be visible to Mascot. So, for example, a single physical Xeon 5000 processor with dual cores and HT will appear to have 4 cpus. In this case, the number of threads should be set to 4 using the database maintenance utility.

Hyper-threading can give up to a 12% performance increase. It is not equivalent to a true multi-core processor.

Versions prior to Mascot 2.0 did not support HT, so for Mascot 1.9 and earlier, HT needs to be disabled in the BIOS.

Hyper-threading does not count towards the number of cores used in Mascot 2.3

Cache Size

An on-board memory cache is used by the CPU to reduce the average time to access the main memory. The cache uses faster memory to store copies of the data from the most frequently used main memory locations. As long as most memory accesses are to the cache, the processor will be able to run at nearly full speed. If the cache is too small, then the processor will often be waiting for data from main memory, and searches will run more slowly. For this reason, we don’t recommend the Intel Celeron processors, which have a rather small cache. However, it seems that there is only about a 5% performance increase when going from 512kB cache to 1MB. We don’t have figures for going from 1MB to 2MB cache.

Intel Pentium and AMD Processors

We do not have space for computers with all the different CPU types and speeds so we only have limited benchmark information. We have found that performance for searches under identical conditions are roughly proportional to the results from this benchmark test. (Remember to ‘divide by 2′ when comparing the [Dual CPU] systems with the single processor systems). The performance ratios published here are similar but slightly harder to understand.

We have experienced some issues with the Athlon 64 processor, when trying to lock databases into memory under Windows, and therefore don’t recommend this processor. Opteron processors don’t exhibit this problem.

Intel Itanium Processors

Matrix Science does not support Itanium processors. Although Mascot will run in the 32 bit compatibility mode, the performance will be very poor. It will also be necessary to install a 32 bit version of Perl, which may be non-trivial. A test Mascot port to native Itanium processors gave comparatively poor performance compared with Pentium processors, so Matrix Science has no plans to provide an Itanium version.

Random Access Memory (RAM)

RAM requirements are strongly dependent on the selection of databases you plan to search.

Mascot Monitor makes a compressed copy of each FASTA database, in which the title lines have been removed and the sequence strings have been packed in a byte efficient manner. The compressed copy of each database is mapped into RAM and, if there is sufficient room, can even be locked into memory.

When a search calls for a database that is not in memory, the search duration is increased by the time taken to read the database from disk. For a search that takes longer than a couple of minutes, this additional time will be negligible. For a short search, for example a PMF or an MS/MS search of a few spectra, reading from disk may take longer than the search itself.

Databases should always be memory mapped, even though a system might not have sufficient physical RAM to hold them all. Memory mapping only consumes virtual address space, and enables the file to be accessed more efficiently. However, it doesn’t guarantee that a particular database will be in memory when a search calls for it; some other process may have kicked it out. So, it may be advantageous to lock a small, frequently searched database into memory, guaranteeing that it is always resident in RAM.

Whether you have sufficient RAM to lock a database in memory can be estimated from the size of the FASTA file. For a protein database, the required RAM is roughly 80% of the FASTA file size, while for a nucleic acid database it is roughly 50%. Some examples are given in the following table, but the comprehensive sequence databases increase significantly in size every month.

DatabaseFASTA (MB)RAM (MB)Compression
Swiss-Prot1611341 : 0.83
NCBInr3,2602,6871 : 0.82
EST_others25,47112,8871 : 0.51

You also need to allow approximately 60 MB for the operating system (Windows) and at least 150 MB for each executing Mascot search. We do not currently recommend that NCBInr or MSDB be locked into memory except when using a cluster or 64 bit Mascot on Linux.

In practice, it is rarely a sensible for a database as large as EST_others to be locked in memory. Being composed of short stretches of nucleic acid sequence, it is not suitable for peptide mass fingerprint searches, and tends to be used as a database of last resort for large searches, where the overhead of reading it from disk represents only a small part of the total search time.

From January 2008 onwards, it is not possible to even map the EST_others database in memory on a 32 bit system.

Hard Disk Storage

The Mascot program files require very little disk space in comparison to the sequence databases and the accumulating result files.

For the sequence databases, you will need to maintain free disk space of the order of 3 times the largest database. This is because, during a database update, there may be the current FASTA file, reference file and its associated compressed files plus the equivalent for the incoming database. Mascot also keeps a copy of one previous database. Current (February 2008) disk requirements for the common databases are:

DatabaseTotal size of files (GB)Max disk space (GB)
Swiss-Prot26
NCBInr618
MSDB824
EST_human824
EST_mouse515
EST_others40120

It would not be unreasonable to allow 200GB for databases, and this could grow to 400GB within 2 years. However, it is unusual to require all three EST databases.

The space needed for result files depends on the overall search profile and on how long results are to remain on-line. Individual result file sizes range from 20 kB for a peptide mass fingerprint search through to several hundreds of MB for a large LC-MS/MS dataset.

Disk drives are very inexpensive, and most PC’s support up to four SATA devices. It is difficult to have too much disk space, especially if you plan to search databases similar in size to dbEST.

If any databases are not memory mapped, short searches may be disk I/O bound, and a fast disk (e.g. fast wide SCSI) or a disk array (e.g. RAID) can then become an important factor in maximising throughput.

Operating System

Microsoft Windows

The following versions of Windows are supported:

Operating SystemMax CPUMax RAM (GB)
2000 Professional24
2000 Server44
2000 Advanced Server88
2000 Data Center3232
XP Professional24
XP Professional – 64 bit edition216
2003 Web Edition22
2003 Standard Edition44
2003 Enterprise Edition832
2003 Data Center Edition3264
2003 Standard Edition – 64 bit432
2003 Enterprise Edition – 64 bit82048
2003 Data Center Edition – 64 bit642048
Vista Home Premium – 32 bit14
Vista Home Premium – 64 bit116
Vista Business – 32 bit24
Vista Business – 64 bit2128
Vista Enterprise – 32 bit24
Vista Enterprise – 64 bit2128
Vista Ultimate – 32 bit24
Vista Ultimate – 64 bit2128
2008 Server Web Edition – 32 bit44
2008 Server Web Edition – 64 bit432
2008 Server Standard Edition – 32 bit44
2008 Server Standard Edition – 64 bit432
2008 Server Enterprise Edition – 32 bit864
2008 Server Enterprise Edition – 64 bit82048
2008 Server Datacenter Edition – 32 bit3264
2008 Server Datacenter Edition – 64 bit642048
Windows 7, Ultimate, Enterprise and Professional – 32 bit24
Windows 7, Ultimate, Enterprise and Professional – 64 bit2192
  • Support for Windows NT4 SP6 was discontinued in 2007.
  • Support for Windows 2000 will be discontinued in July 2010.
  • Microsoft Windows XP home is not supported.
  • Microsoft Windows Vista Home Basic and start editions are not supported.
  • Windows Vista and Windows 2008 server require Mascot 2.2.03 or later
  • Windows 2008 server, “Server Core” is not supported
  • Windows 7 Home Premium, Home Basic and Starter editions are not supported

Linux

Mascot will run on most Linux distributions, and is tested in-house on:

  • Debian 6
  • Debian 7
  • CentOS 6

If the number of processors (sockets) in the system is greater than the number of CPUs in the Mascot license, the kernel needs to be 2.6 or later.

Web Server Software

Mascot requires a web server for administration and interactive use. In the case of Windows, Microsoft Internet Information Server (IIS) is the obvious choice unless you are committed to some other package. IIS is bundled with Windows 2000, XP Professional, Windows 2003 server, Windows Vista and Windows 2008 server

The Mascot installation program automatically configures IIS versions 4 and later.

Apache is a good choice for Linux. Apache Version 2.0 can also be used under Windows.

Running a web browser on the same PC as the web server can take a surprising amount of processor time, so search times may suffer. If the same PC is also used for instrument control and data acquisition, you may need to adjust job priorities using Windows Task Manager to ensure that the instrument gets adequate priority.

Mascot Cluster Mode

A Mascot licence for 4 or more processors automatically supports operation on a cluster of systems connected by a dedicated 100 Base-T or Gigabit LAN. A cluster offers several advantages over a single, multi-processor system:

  • Mass market, reliable, low cost PC hardware can be used.
  • The cluster can be incrementally expanded as workload increases.
  • The RAM required to map sequence databases is distributed across multiple systems, circumventing the limits of a single system.
  • The limited bandwidth of the PC bus is effectively multiplied by the number of systems in the cluster.

Further product information including details of recommended IBM hardware is available here.