Today I noticed an error message in a web application saying that PCRE was compiled without Unicode support. A quick pcretest -C showed that UTF-8 was supported, but Unicode properties where not, which was apparently causing the problem.
# pcretest -C PCRE version 6.6 06-Feb-2006 Compiled with UTF-8 support No Unicode properties support
In order to get this support, a recompile of the PCRE package is needed with the addition of Unicode properties support. To get this done, the source package for PCRE 6.6 is needed, which can be downloaded from any CentOS mirror with a simple wget, like this:
# wget http://mirrors.kernel.org/centos/5/os/SRPMS/pcre-6.6-2.el5_1.7.src.rpm
Now, to get this source RPM unpacked and customized, a so called RPM build environment needs to be created. Instructions for that can be found on CentOS’s own website: http://wiki.centos.org/HowTos/SetupRpmBuildEnvironment
It’s pretty easy. Only the rpm-build package is needed (and make/gcc if you don’t have them yet), then create a ~/.rpmmacros file to specify the build path. Finally, the build path is populated with the folders mentioned on the CentOS page.
Now that the RPM build environment is in place, the source RPM can be installed there, by simply running:
rpm -ivh pcre-6.6-2.el5_1.7.src.rpm
Now the RPM has installed some files in the build environment. At this point customizations can be made before the package is rebuilt. In this case we want to add unicode properties support. This step is very simple. Just edit the specs file located at SPECS/pcre.spec within the build environment. Only one change is needed to add unicode properties support. The source package comes with only one configure option:
Simply append that with –enable-unicode-properties:
%configure --enable-utf8 --enable-unicode-properties
Now that the modifications have been made, the RPM can be rebuilt by running:
# rpmbuild -ba ~/rpmbuild/SPECS/pcre.spec
Note that the RPM build environment path might be different in your case. In the above example it’s ~/rpmbuild.
Now that the RPM is rebuilt, you can upgrade the package by running:
# rpm -Uvh ~/rpmbuild/RPMS/i386/pcre-6.6-2.7.i386.rpm
Note that if you are on a 64-bit environment, the path and name will be different (it will say x86_64 instead of i386), the output of the rpmbuild will tell you where the RPM’s have been written.
That’s it! Unicode support should now have been added:
# pcretest -C PCRE version 6.6 06-Feb-2006 Compiled with UTF-8 support Unicode properties support