пятница, 20 декабря 2013 г.

Unexpected unloading of mono web application


After several bugs in mono gc were fixed, I was able to run benchmarks for aspx page in apache2+mod-mono server. I used mono from master branch, mono --version says: "Mono Runtime Engine version 3.2.7 (master/01b7a50 Sat Dec 14 01:48:49 NOVT 2013)". Crashes with SIGSEGV went away but unfortunately I can't say that serving aspx with apache2 are stable now. Two times during benchmarks I've got something similar to deadlock: mono stopped to process requests and stuck at consuming 100% of CPU. Don't know what was that, my try to debug mono process with GDB did not bring an answer (unlike the other cases when GDB help me to find cause of deadlocks/SIGSEGV or at least the place of suspicious code and send this info to mono team). Also there are memory leaks. And a bad thing exists, that the server stops responding after processing ~160 000 requests, but there is workaround for it.

Mono .aspx 160K requests limit

If you run ab -n 200000 http://yoursite/hello.aspx where hello.aspx is a simple aspx page which do nothing, and site is served under apache mod-mono, after ~160K request you'll get deny of service. This error caused by several reasons I'll try to explain, what is going on and how to avoid this

When request comes to aspx page web server creates new session. Than the session saves to internal web cache. When the second request comes, the server tries to read session cookies and, if not found, creates and saves new session to the cache again. So every request without cookies creates new session object in the cache. This could provide huge memory leaks, when the number of sessions grow unstoppable, to prevent this web server has the maximal limit of objects, which internal web cache can store. This limit is defined as constant in Cache.cs and hardcoded to 15000

When the number of objects in internal cache hits 15000, web server starts to aggressively delete all objects from the cache using LRU strategy. So if user got the session 5 minutes ago and works with site by clicking the page every minute his session will be removed from cache (and lost all the data inside the session) in opposite to some hazardous script (without session cookies was set) which gets 15K requests to the page during last minute and creates 15K empty sessions. But this is not all.

Internal cache is also used for storing some important server objects, for example all dynamically compiled assemblies are stored there. And there is no preference for server objects when deleting from cache all objects are equal. So if some server object was not accessed too long it will be removed. And this is the cause of second error

Here the code of GetCompiledAssembly() method. It's called every time, when the page is accessed

   string vpabsolute = virtualPath.Absolute;
   if (is_precompiled) {
    Type type = GetPrecompiledType (vpabsolute);
    if (type != null)
     return type.Assembly;
   }
   BuildManagerCacheItem bmci = GetCachedItem (vpabsolute);
   if (bmci != null)
    return bmci.BuiltAssembly;

   Build (virtualPath);
   bmci = GetCachedItem (vpabsolute);
   if (bmci != null)
    return bmci.BuiltAssembly;
   
   return null;

Let's look. When .aspx page is accessed for the first time it tries to check if it was precompiled. If did it run process method. If not, it tries to find the compiled page in the internal cache and if not found there it compiles the page and stores compiled type into the cache (inside the Build() function). The schema looking good, but not in our case. When the internal cache overgrows 15K limit compiled type is removed from the cache even it was accessed right now! I think there is some bug in LRU implementation or maybe object are got from LRU only once and saved into some temp variable, so LRU object does not update last access time.

You may ask: "So what? Compiled type was deleted from the cache, but won't it be there on the next page get? Algorithm checks existence of the type in the cache, and if not found it compiles it again and places to cache. It could reduce performance, but could not be a reason of denial of service". And you'll be right. This is not exactly the reason of DoS. But if you look inside of page compilation, you'll find that it has a limit of recompilation times. And if this limit is reached it starts to unload AppDomain with the whole application! And at the last mod-mono somehow does not control AppDomain unloading, don't know why it should, but after 160K request the page is stopped responding.

try {
 BuildInner (vp, cs != null ? cs.Debug : false);
 if (entryExists && recursionDepth <= 1)
  // We count only update builds - first time a file
  // (or a batch) is built doesn't count.
  buildCount++;
} finally {
 // See http://support.microsoft.com/kb/319947
 if (buildCount > cs.NumRecompilesBeforeAppRestart)
  HttpRuntime.UnloadAppDomain ();
 recursionDepth--;
}

How can this be workarounded?
I know only one way - always use precompiled web site. At first look I had a hope, that constants LOW_WATERMARK and HIGH_WATERMARK for cache can be changed by setting appropriate environment variable, but, unfortunately it's not. In my opinion cache usage should be rewritten - user sessions and web server internal objects should have different storage places and must not affect each other. Also session should not be created at first page access, if the page doesn't asks for session object, it can be created later, when it really needed for processing the page

среда, 11 декабря 2013 г.

ServiceStack performance on mono part4


Today I again tried to increase performance of ServiceStack on the Mono. In the first part I noted that profiler showed large amount of calls and execution time of Hashtable:GetHash(), SimpleCollator:CompareInternal() and Char:ToLower() methods. To understand why these methods works slow I checked the call stack and found that most of the calls are maden from HttpHeadersCollection class. When I looked inside the source and saw that HttpHeadersCollection uses InvariantCultureIgnoreCase string comparison instead of OrdinalIgnoreCase which is more suitable when comparing names of headers (because they do not need be linguistic equivalent) and should be more performant

To be sure of Hashtable and Dictionary performance with various StringComparing options I wrote simple benchmark. It adds 100 000 strings and than tries to get them one by one for every StringComparing options. The original idea of test code I get from here. My test is slightly modified.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Collections;

namespace DictPerfomanceTest
{
 class ComparerInfo
 {
  public string Name { get; set;}

  public StringComparer Comparer { get; set;}

  public ComparerInfo(string name, StringComparer comparer)
  {
   Name = name;
   Comparer = comparer;
  }
 }

 class MainClass
 {
  const int nCount=100000;
  const string prefix = "SomeSomeString";

  static readonly ComparerInfo[] Comparers=new ComparerInfo[]
  {
   new ComparerInfo("CurrentCulture",StringComparer.CurrentCulture),
   new ComparerInfo("CurrentCultureIgnoreCase",StringComparer.CurrentCultureIgnoreCase),
   new ComparerInfo("InvariantCulture",StringComparer.InvariantCulture),
   new ComparerInfo("InvariantCultureIgnoreCase",StringComparer.InvariantCultureIgnoreCase),
   new ComparerInfo("Ordinal",StringComparer.Ordinal),
   new ComparerInfo("OrdinalIgnoreCase",StringComparer.OrdinalIgnoreCase)
  } ;

  public static void Main (string[] args)
  {
   foreach(var ci in Comparers)
   {
    Console.WriteLine ("Hashtable: {0}", ci.Name);
    Run (new Hashtable (ci.Comparer));
   }

   foreach(var ci in Comparers)
   {
    Console.WriteLine ("Dictionary: {0}", ci.Name);
    Run (new Dictionary<string,string> (ci.Comparer));
   }
  }

  private static void Run(Hashtable hashtable)
  {
   for(int i = 0; i < nCount; i++)
   {
    hashtable.Add(prefix+i.ToString(), i.ToString());
   }

   Stopwatch sw = new Stopwatch();
   sw.Start();
   for (int i = 0; i < nCount; i++)
   {
    string a = (string)hashtable[prefix+i.ToString()];
   }
   sw.Stop();
   Console.WriteLine("Time: {0} ms", sw.ElapsedMilliseconds);
  }

  private static void Run(Dictionary<string, string> dictionary)
  {
   for(int i = 0; i < nCount; i++)
   {
    dictionary.Add(prefix+i.ToString(), i.ToString());
   }

   Stopwatch sw = new Stopwatch();
   sw.Start();
   for (int i = 0; i < nCount; i++)
   {
    string a = dictionary[prefix+i.ToString()];
   }
   sw.Stop();
   Console.WriteLine("Time: {0} ms", sw.ElapsedMilliseconds);
  }

 }
}

Comparison OptionHashtable time (ms)Dictionary time (ms)
CurrentCulture19 13116 030
CurrentCultureIgnoreCase20 45816 587
InvariantCulture18 35915 161
InvariantCultureIgnoreCase21 12816 192
Ordinal5846
OrdinalIgnoreCase7373

What can I say? Don't use InvariantCulture or Culture-depended comparison in mono if you don't need it really! In most cases when you use string as dictionary key you can safely use Ordinal or OrdinalIgnoreCase string comparing options. For example names of caching keys in Redis, paths, names of configuration elements in xml are good candidates for Ordinal comparison. By default Dictionary uses Ordinal and Hashtable uses OrdinalIgnoreCase comparison for strings, but don't forget to pass these options to String.Compare(), String.StartWith(), String.EndWith() methods if you want to run you software fast and more predictable

Very good explanation about differencies about InvariantCulture and Ordinal comparison you can read here. In two lines of code it's looking like this:

Console.WriteLine(String.Equals("æ", "ae", StringComparison.Ordinal)); // Prints false  
Console.WriteLine(String.Equals("æ", "ae", StringComparison.InvariantCulture)); // Prints true

I changed HttpHeadersCollection in the commit and made a pull request to mono. Hope it will be reviewed and approved. Also I am going to change hashing functions for HttpRequest headers, first tests shows 3x to 6x performance improvement of ordinal case insensitive hash function without any changes of hashing algorithm

Links:


ServiceStack performance in mono. Part 1
ServiceStack performance in mono. Part 2
ServiceStack performance in mono. Part 3

четверг, 5 декабря 2013 г.

ServiceStack performance in mono part 3


In previous post I benchmarked various HTTP mono backends in linux and found that Nginx+mono-server-fastcgi pair is very slow in comparison with others. There was several times difference in number of served requests per second! So two questions were raised: the first is "Why is so slow?" and second "What can be done to improve performance?". In this post I'll try to answer to both questions

Why is so slow?

Let's profile fastcgi mono server. You should remember that profiling can be enabled by setting appropriate MONO_OPTIONS environment variable. If you don't you can read about web servers profiling options in the first part

After running profile I've got the results

Total(ms) Self(ms)      Calls Method name
  243637        4       1002 (wrapper remoting-invoke-with-check) Mono.WebServer.FastCgi.ApplicationHost:ProcessRequest (Mono.WebServer.FastCgi.Responder)
  140963        4        591 (wrapper runtime-invoke) :runtime_invoke_void__this___object (object,intptr,intptr,intptr)
  140863       60        501 Mono.FastCgi.Server:OnAccept (System.IAsyncResult)
  140570       25        501 Mono.FastCgi.Connection:Run ()
  129977        3        501 Mono.FastCgi.Request:AddInputData (Mono.FastCgi.Record)
  129971        5        501 Mono.FastCgi.ResponderRequest:OnInputDataReceived (Mono.FastCgi.Request,Mono.FastCgi.DataReceivedArgs)
  129964        0        501 Mono.FastCgi.ResponderRequest:Worker (object)
  129963        1        501 Mono.WebServer.FastCgi.Responder:Process ()
  129959       34        501 (wrapper xdomain-invoke) Mono.WebServer.FastCgi.ApplicationHost:ProcessRequest (Mono.WebServer.FastCgi.Responder)
  122777        3        501 (wrapper xdomain-dispatch) Mono.WebServer.FastCgi.ApplicationHost:ProcessRequest (object,byte[]&,byte[]&)
  113673        3        501 Mono.WebServer.FastCgi.ApplicationHost:ProcessRequest (Mono.WebServer.FastCgi.Responder)
  112227       14        501 Mono.WebServer.BaseApplicationHost:ProcessRequest (Mono.WebServer.MonoWorkerRequest)
  112205        2        501 Mono.WebServer.MonoWorkerRequest:ProcessRequest ()
  111942        2        501 System.Web.HttpRuntime:ProcessRequest (System.Web.HttpWorkerRequest)
  111761        3        501 System.Web.HttpRuntime:RealProcessRequest (object)
  111745       11        501 System.Web.HttpRuntime:Process (System.Web.HttpWorkerRequest)
  110814        7        501 System.Web.HttpApplication:System.Web.IHttpHandler.ProcessRequest (System.Web.HttpContext)
  110785        7        501 System.Web.HttpApplication:Start (object)
  110148       14        501 System.Web.HttpApplication:Tick ()
  110133      346        501 System.Web.HttpApplication/c__Iterator1:MoveNext ()
   73347       92       6012 System.Web.HttpApplication/c__Iterator0:MoveNext ()
   64025       32        501 System.Web.Security.FormsAuthenticationModule:OnAuthenticateRequest (object,System.EventArgs)
   62704      141      21042 Mono.WebServer.FastCgi.WorkerRequest:GetKnownRequestHeader (int)
   62550      250      45647 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadObject (System.Runtime.Serialization.Formatters.Binary.BinaryElement,System.IO.BinaryReader,long&,object&,System.Runtime.Serialization.SerializationInfo&)
   62273        5       1002 System.Web.HttpRequest:get_Cookies ()
   62203      134      20040 Mono.WebServer.FastCgi.WorkerRequest:GetUnknownRequestHeaders ()
   56381        6       1002 (wrapper remoting-invoke-with-check) Mono.WebServer.FastCgi.Responder:GetParameters ()
   56373       34        501 (wrapper xdomain-invoke) Mono.WebServer.FastCgi.Responder:GetParameters ()
   54634      368      44653 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteObjectInstance (System.IO.BinaryWriter,object,bool)
   51554       16       1514 System.Runtime.Serialization.Formatters.Binary.BinaryFormatter:Deserialize (System.IO.Stream)
   51537       47       1514 System.Runtime.Serialization.Formatters.Binary.BinaryFormatter:NoCheckDeserialize (System.IO.Stream,System.Runtime.Remoting.Messaging.HeaderHandler)
   51531       34      12007 System.Runtime.Remoting.RemotingServices:DeserializeCallData (byte[])
   50521       19       1514 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadObjectGraph (System.Runtime.Serialization.Formatters.Binary.BinaryElement,System.IO.BinaryReader,bool,object&,System.Runtime.Remoting.Messaging.Header[]&)
   48246       46       7536 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadNextObject (System.IO.BinaryReader)
   47020      999      54096 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadValue (System.IO.BinaryReader,object,long,System.Runtime.Serialization.SerializationInfo,System.Type,string,System.Reflection.MemberInfo,int[])
   35051      143      22013 System.Runtime.Remoting.RemotingServices:SerializeCallData (object)
   34198        7       1516 System.Runtime.Serialization.Formatters.Binary.BinaryFormatter:Serialize (System.IO.Stream,object)
   34190       15       1516 System.Runtime.Serialization.Formatters.Binary.BinaryFormatter:Serialize (System.IO.Stream,object,System.Runtime.Remoting.Messaging.Header[])
   33354       28       1516 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteObjectGraph (System.IO.BinaryWriter,object,System.Runtime.Remoting.Messaging.Header[])
   33253       78       1516 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteQueuedObjects (System.IO.BinaryWriter)
   29792      539      16549 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteObject (System.IO.BinaryWriter,long,object)
   28486      656      49652 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteValue (System.IO.BinaryWriter,System.Type,object)
   26041      101        501 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadGenericArray (System.IO.BinaryReader,long&,object&)
   24552       16        501 System.Web.HttpApplication:PipelineDone ()
   23851       58        501 System.Web.HttpApplication:OutputPage ()
   23782       20        501 System.Web.HttpResponse:Flush (bool)
   23079      598      16539 System.Runtime.Serialization.Formatters.Binary.ObjectReader:ReadObjectContent (System.IO.BinaryReader,System.Runtime.Serialization.Formatters.Binary.ObjectReader/TypeMetadata,long,object&,System.Runtime.Serialization.SerializationInfo&)
   22542       24        501 (wrapper xdomain-dispatch) Mono.WebServer.FastCgi.Responder:GetParameters (object,byte[]&,byte[]&)
   19536       39       3030 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteArray (System.IO.BinaryWriter,long,System.Array)
   18377      105        501 System.Runtime.Serialization.Formatters.Binary.ObjectWriter:WriteGenericArray (System.IO.BinaryWriter,long,System.Array)

In profile you can see there are alot of binary serialization calls which take most of the processing time. But if you look into the mono fastcgi code, you don't find any explicit calls of BinarySerializer. What is going on? I hope you've already guessed what caused such overhead in serialization calling in other case let's look on to the picture:

New FastCGI request handler is created for every request from Nginx, than request looks for corresponding web application by HTTP_HOST server variable and after application have found creates new HttpWorkerRequest inside of it, and calls Process method to process it. While processing web application communicates with FastCGI request handler (asks for HTTP headers, returns HTTP response and so on). Because FastCGI request handler and web application are located in different domains all calls between them goes through remoting. Remoting calls binary serialization for objects are passed and this makes application slow. I'd rather say remoting makes application VERY VERY VERY SLOW if you pass complex types between endpoints. It's a prime evil of distributed applications which need to be performant. Don't use remoting if you have another choice to communicate between your apps.

OK, we found, that fastcgi server actively uses remoting inside of it and this can reduce performance. But is the remoting only one thing which dramatically reduces the performance? Maybe FastCGI protocol itself is a very slow and we couldn't use fast and reliable mono web server with nginx?

To check this I decided to write simple application based on mono-server-fastcgi source code. The application should instantly return "Hello, world!" http response for every http request without using remoting. If I could write such app and it would be more performant, I would proved that more reliable web server could be created.

Proof of concept

I took FastCGI server sources and wrote my own network server based on async sockets. From the old sources I only got FastCGI record parser, all other I rid off. After the simple app has been completed, I made a benchmarks

Before publishing results, let's remember benchmarks of mono-server-fastcgi were maden in previous post.

Configurationrequests/secStandart deviationstd dev %Comments
Nginx+fastcgi-server+ServiceStack571.368.811.54Memory Leaks
Nginx+fastcgi-server hello.html409.489.142.23Memory Leaks
Nginx+fastcgi-server hello.aspx458.559.892.16Memory Leaks, Crashes
Nginx+proxy xsp4+ServiceStack1402.3345.423.24Unstable Results, Errors

This benchmarks were maden with Apache ab tool using 10 concurrent requests. You can see, that fastcgi mono server performs 400-500 requests per second. In new benchmarks I additionally variate number of concurrent requests to see influence on the results. The command was
ab -n 100000 -c <concurency> http://testurl

Nginx configuration:

 server {
         listen   81;
         server_name  ssbench3;
         access_log   /var/log/nginx/ssbench3.log;
 
         location / {
                 root /var/www/ssbench3/;
                 index index.html index.htm default.aspx Default.aspx;
                 fastcgi_index Default.aspx;
                 fastcgi_pass 127.0.0.1:9000;
                 include /etc/nginx/fastcgi_params;
         }
}

Benchmark results:

Nginx fastcgi settingsConcurencyRequests/SecStandart deviationstd dev %
TCP sockets102619.5649.951.83
TCP sockets202673.19819.430.72
TCP sockets302681.16615.830.59

Significant difference isn't it? These results give us a hope, that we can increase throughoutput of fastcgi server if we change the architecture and remove remoting communication from it. By the way there is a room to increase performance. Are you ready to go further?

Faster higher stronger

Next step I've done I switched connumication between nginx and server from TCP sockets to Unix sockets. Config and results

 server {
         listen   81;
         server_name  ssbench3;
         access_log   /var/log/nginx/ssbench3.log;
 
         location / {
                 root /var/www/ssbench3/;
                 index index.html index.htm default.aspx Default.aspx;
                 fastcgi_index Default.aspx;
                 fastcgi_pass unix:/tmp/fastcgi.socket;
                 include /etc/nginx/fastcgi_params;
         }
}

Results

Nginx fastcgi settingsConcurencyRequests/SecStandart deviationstd dev %
Unix sockets102743.62240.911.49
Unix sockets202952.24467.862.29
Unix sockets302949.11886.192.92

It gained up to 5-10%. Not so bad but I want to increase performance more better, because when we'll change simple http response from fastcgi request handler to real ASP.NET process method we will loose a lot of performance points.

One of the questions, answer to it could help to increase performance: is there a way to keep connection between nginx and fastcgi server instead of create it for every request? In above configurations nginx requires to close connection from fastcgi server to approve end of processing request. By the way FastCGI protocol has EndRequest command and keeping connection and using EndRequest command instead of closing connection could save huge amount of time in processing small requests. Fortunately, nginx has support of such feature, it's called keepalive. I enabled keepalive and set minimal number of open connections to 32 between nginx and my server. I choosen this number, because it was higher than the maximum number of concurrent requests I did with ab.

 upstream fastcgi_backend {  
#   server 127.0.0.1:9000;
    server unix:/tmp/fastcgi.socket;
    keepalive 32;
}
 
 server {
         listen   81;
         server_name  ssbench3;
         access_log   /var/log/nginx/ssbench3.log;
 
         location / {
                 root /var/www/ssbench3/;
                 index index.html index.htm default.aspx Default.aspx;
                 fastcgi_index Default.aspx;
                 fastcgi_keep_conn on;
                 fastcgi_pass fastcgi_backend;
                 include /etc/nginx/fastcgi_params;
         }
}
Nginx fastcgi settingsConcurencyRequests/SecStandart deviationstd dev %
TCP sockets. KeepAlive103720.2349.361.33
TCP sockets. KeepAlive303907.8580.482.06
Unix sockets. KeepAlive104024.678122.333.04
Unix sockets. KeepAlive204458.71472.871.63
Unix sockets. KeepAlive304482.64819.400.43

Wow! That is a huge performance gains! Up to 50% compared with previous results! So I thought this is enough for proof of concept and I could start to create more faster fastcgi mono web server. To proove my thought I made simple .NET web server (without nginx), which always returns "Hello, world!" http response and test it with ab. It shows me ~5000 reqs/sec and this is close to my fastcgi proof of concept server

HyperFastCGI server

The target is clear now. I am going to create fast and reliable fastcgi server for mono, which can serve in second as much requests as possible and be stable. Unfortunatly it cannot be maden as just performance tweaking of current mono fastcgi server. The architecture needs to be changed to avoid cross-domain calls while processing requests.

What I did:

  • I wrote my own connection handling using async sockets. It should also decrease processor usage, but I did not compare servers by this parameter.
  • I totally rewrote FastCGI packets parsing, trying to decrease number of operations needed to handle them.
  • I changed the architecture by moving FastCGI packet handling to the same domain, where web application is located.
  • Currently there are no known memory leaks when processing requests.
This helped to improve performance of the server, here are the benchmarks:
UrlNginx fastcgi settings/ConcurencyRequests/SecStandart deviationstd dev %
/hello.aspxTCP keepalive/101404.17424.931.78
/servicestack/jsonTCP keepalive/101671.1521.401.28
/servicestack/jsonTCP keepalive/201718.15841.462.41
/servicestack/jsonTCP keepalive/301752.6934.561.97
/servicestack/jsonUnix sockets keepalive/101755.5540.302.30
/servicestack/jsonUnix sockets keepalive/201817.48839.302.16
/servicestack/jsonUnix sockets keepalive/301822.98436.482.00

The performance compared to original mono fastcgi server raised up serveral times! But this is not enough. While testing I found that threads created and destroyed very often. Creation of threads is expensive operation and I decided to increase minimal number of threads in threadpool. I added new option /minthreads to the server and set it to /minthreads=20,8 which means that there will be at least 20 running working threads in threadpool and 8 IO threads (for async sockets communications).

/minthreads=20,8 benchmarks:

UrlNginx fastcgi settings/ConcurencyRequests/SecStandart deviationstd dev %
/servicestack/jsonTCP keepalive/102041.24623.181.14
/servicestack/jsonTCP keepalive/202070.0810.950.53
/servicestack/jsonTCP keepalive/302093.52624.271.16
/servicestack/jsonUnix sockets keepalive/102156.75437.741.75
/servicestack/jsonUnix sockets keepalive/202182.77442.961.97
/servicestack/jsonUnix sockets keepalive/302268.67628.391.25

Such easy thing gives performance boost up to 20%!

Finally, I place all nginx configurations benchmarks in one chart

At the end I say that HyperFactCgi server can be found at github. Currently it's not well tested, so use it at your own risk. But at least all ServiceStack(v3) WebHosts.Integration tests which passed with XSP passed with HyperFastCgi too. To install HyperFastCgi simply do:

git clone https://github.com/xplicit/HyperFastCgi.git
cd HyperFastCgi
./autogen.sh --prefix=/usr && make
sudo make install

configuration options are the same as mono-server-fastcgi plus few new parameters:
/minthreads=nw,nio - minimal number of working and iothreads
/maxthreads=nw,nio - maximal number of working and iothreads
/keepalive=<true|false> - use keepalive feature or not. Default is true
/usethreadpool=<true|false> - use threadpool for processing requests. Default is true

If HyperFastCgi server be interesting to others for using it in production I am going to improve it. What can be improved:

  • Support several virtual paths in one server.Currently only one web application is supported
  • Write unit tests to be sure, that the server is working properly
  • Catch and properly handle UnloadDomain() command from ASP.NET. This command is raised when web.config is changed or under some health checking by web-server. (Edit: already done)
  • Add management and monitoring application which shows server statistics (number of requests serverd and so on) and recommends performance tweaks
  • Additional performance improvements

Links:
HyperFastCgi server source code
ServiceStack performance in mono. Part 1
ServiceStack performance in mono. Part 2

ServiceStack performance in mono. Part 4