Sunday, May 17, 2009

Facebook's XHProf PHP module patch for Mac OS X Leopard Build

*) note (05 Jun 2009): Thanks to Kannan from Facebook dev team the patch is now available as part of the official 0.9.2 release. You can download it from PECL.

Download full patched XHProf 0.9.1 for MAC OS X Leopard
Bug submitted on PECL.

Full story
-----------

Yesterday (15 May 2009) I ran into XHProf module made by Facebook and published open source in March 2009.
Right after I read about its great features I was sure I had to try it and eventually put it on production servers. I was looking for such kind of PHP profiler with stats since 2005 when I found XDebug, which is really cool while developing but I wouldn't put it on a live server with high load like Tupalka.com's.

Eventually later that day our server hosting Tupalka.com was hit by a huge traffic and it was very loaded from time to time when the eAccelerator caches expires and most of the heavy mysql queries have to be ran once per 15 mins. That force me to try the XHProf.
Yup, I am sure that the problem was in the queries then why I was going for the XHProf? Good question. Curiosity.

These very moments are the once that push me to find and try different and new tools for optimization of web application/web sites.
So I was desparate to put XHProf on the production server at that moment, but I wanted just to take a look at it on my laptop for 30minutes before going live.

And here starts my real story with XHProf.

I downloaded latest PHP 5.2.9 and copied XHProf/extension folder to php_source/ext, ran:
#rm ./configure; ./buildconf --force; ./configure --prefix=/usr/local/php --with-xhprof; make

and ... it did not compile. The make exited with:

php_source/ext/xhprof/xhprof.c:202: error: syntax error before 'cpu_set_t'
php_source/ext/xhprof/xhprof.c:202: warning: no semicolon at end of struct or union
php_source/ext/xhprof/xhprof.c:213: error: syntax error before '}' token
php_source/ext/xhprof/xhprof.c:213: warning: data definition has no type or storage class
php_source/ext/xhprof/xhprof.c:222: error: syntax error before 'hp_globals'
php_source/ext/xhprof/xhprof.c:222: warning: data definition has no type or storage class

"Ooohh, gush" I thought and I opened xhprof.c
It's kinda miss-reading the docs on my side, cause I opened then and saw:

"Note: A windows port hasn't been implemented yet. We have tested xhprof on Linux/FreeBSD so far.

Note: XHProf uses the RDTSC instruction (time stamp counter) to implement a really low overhead timer for elapsed time. So at the moment xhprof only works on x86 architecture. Also, since RDTSC values may not be synchronized across CPUs, xhprof binds the program to a single CPU during the profiling period."

Ahh, they haven't tested the module on MAC OS X. And why should they? All of their servers run on Linux or FreeBSD. It's clear.

Ok, now I have to look and see if I can patch it.
(Oh, I forgot to tell that meanwhile I decided to change eAccelerator's cache expire on the loaded server to 2 hours, so load problem was resolved for a day or two and I had time to look at the XHProf)

Clearly XHProf documentation says that the module has to be bind to a single CPU and I knew that the "Thread Affinity" is implemented in a little bit different way in MAC OS X than in Linux or FreeBSD. Maybe that was the problem?
Yup, it is. Facebook team have made a patch for the affinity function in FreeBSD, but that's it. Now I have to add a similar solution for OS X.

I started to read more about thread affinity in Leopard and I found this document, which explains Apple's affinity API. Then after reading again and again, looking for some code snippets over the net I got the solution:

on line 30 to 49 replace with this:

#ifdef __FreeBSD__
# if __FreeBSD_version >= 700110
#   include
#   include
#   define cpu_set_t cpuset_t
# define SET_AFFINITY(pid, size, mask) \
cpuset_setaffinity(CPU_LEVEL_WHICH, CPU_WHICH_TID, -1, size, mask)
# define GET_AFFINITY(pid, size, mask) \
cpuset_getaffinity(CPU_LEVEL_WHICH, CPU_WHICH_TID, -1, size, mask)
# else
#   error "This version of FreeBSD does not support cpusets"
# endif /* __FreeBSD_version */
#elif __APPLE__
# include
# define cpu_set_t thread_affinity_policy_data_t
# define CPU_SET(cpu_id, new_mask) \
*new_mask.affinity_tag = (cpu_id + 1)
# define CPU_ZERO(new_mask) \
*new_mask.affinity_tag == THREAD_AFFINITY_TAG_NULL
#   define SET_AFFINITY(pid, size, mask) \
thread_policy_set(mach_thread_self(), THREAD_AFFINITY_POLICY, mask, THREAD_AFFINITY_POLICY_COUNT)
#   define GET_AFFINITY(pid, size, mask) \
thread_policy_get(mach_thread_self(), THREAD_AFFINITY_POLICY, mask, THREAD_AFFINITY_POLICY_COUNT)
#else
/* To enable CPU_ZERO and CPU_SET, etc.     */
# define __USE_GNU
/* For sched_getaffinity, sched_setaffinity */
# include
# define SET_AFFINITY(pid, size, mask) sched_setaffinity(0, size, mask)
# define GET_AFFINITY(pid, size, mask) sched_getaffinity(0, size, mask)
#endif /* __FreeBSD__ */


and on line 375 to 378 replace with:
#ifndef __APPLE__
  if (GET_AFFINITY(0, sizeof(cpu_set_t), &hp_globals.prev_mask) <>
    perror("getaffinity");
    return FAILURE;
  }
#endif

You can download whole the patched version of XHProf 0.9.1 from here.
Or get only the diff patch.

After successful compile and run I think it works - sharing the same L2 cache for a given thread when running on multiple cores, but NOT on different CPUs!
At least you can compile and see what's this module and implement it into your project. Then on the live server running Linux/FreeBSD you will definitely have correct data comming from XHProf.

If you find some errors or mistakes please let me know.
I also submitted a bug on the topic on PECL.

Thanks in advance for your oppinion and suggestions.