TL/DR: I have written a CLI utility for Ubuntu to import ModSecurity's audit log file into an sqlite database, which should help people to build a whitelist and reduce false positives. A PPA is available.
Even if you don't use Apache, you might find this interesting. To create my app I had to learn about C++ development on Ubuntu including two third party libraries (Boost Regex and SQLite), version control using Git, the GNU build system "Autotools", how to package software for Ubuntu and Debian, and how to upload packages to a Personal Package Archive (PPA) on Launchpad.
I hope this will spark some interesting discussion. I'd love to hear other people's experiences with any of the above, particularly ubuntu development.
--------------------------------------------------------------------------------------
What is ModSecurity
You may recall some conversations we had before on this forum about Apache2's security module "ModSecurity".
For those of you who haven't used it, ModSecurity is a Web Application Firewall that can be used with a set of rules to "enumerate badness" and decide when to block requests sent to the server. It sits inbetween Apache and the web applications running on the server, and can therefore intercept malicious requests before they are processed by the app. Probably the most common set of rules is the Open Web Application Security Project's Core Rule Set (OWASP CRS), which is in the ubuntu repos as modsecurity-crs.
Here's a typical example: Mr Naughty is trying to hack example.com, a website running a vulnerable installation of WordPress on a LAMP server. Mr Naughty is trying to use an SQL injection attack to create a new admin user in the database so that he can deface the site, steal data etc. However, ModSecurity identifies the SQL injection attack contained in the POST variable sent by Mr Naughty and blocks it before it is executed by Wordpress. The attack fails
Sounds great, right?
Why isn't is more popular?
I started learning about ModSecurity after Steve recommended it to me, about a year and a half ago. As an enthusiastic but inexperienced amateur, I really struggled to configure it properly - each rule is using pattern matching to decide what to block, and there are inevitable false positives.
This means you can't just install it and expect it to work, typically you run ModSecurity in "detection only" mode for a time (rules are evaluated but ModSecurity doesn't actually block anything), and then inspect the audit logs to identify where you need to make amendments to the rules to remove those false positives.
The audit log is a text file with sections for each part of the transaction: the data sent to the server, the response sent back, and any rules that were matched. Since the data for each transaction is split over multiple lines, it does not lend itself to being sorted with simple utilities like grep. Identifying all of the requests from a certain IP address that triggered a given rule is a non-trivial exercise.
Initial Solutions
My first attempt at tackling the problem was to remove the rules that were being triggered at certain locations. To do this I wrote a BASH script, which you can find with a description on my website. The script doesn't look at the audit log file, it just uses the error messages ModSecurity writes to the apache log, and spits out a virtualhost configuration file listing locations (URLs) where certain rules are disabled.
This would work OK if you were running ModSecurity in "traditional" mode, where any rule that is matched results in the request being blocked, but it isn't good for the new anomaly scoring mode (the one that enumerates badness). In the anomaly scoring mode, each rule has a point score and the request is blocked if the score passes over a threshold... I soon realised that my script above was actually just removing the rule that adds up the scores and blocks the request, when it should have been removing the individual rules!
This wasn't good enough. I realised I needed a more fine-tuned approach, so I learned some Perl. Perl can do multiline regex (slowly!), which enabled me to look at the audit log instead of the error log. The perl script I wrote splits the audit log into bits and puts it into a spreadsheet. This is the same fundamental approach as my C++ app, but the spreadsheet quickly becomes extremely sluggish, and the script takes ages to run. It does work, though!
The Solution: auditlog2db commandline utility
So, after my partial success with Perl I decided I needed something serious to tackle the problem. I had read that C++ apps are generally faster than scripting languages like Perl, and wanted to learn the language that our OS is written in. I had an idea that a sqlite database would be a good way to store the information from the audit logs so that it could be sorted quickly, but I didn't know any C++ or anything about sqlite.
I learned:
The result is a C++ commandline utility called auditlog2db that will import the logfile into a sqlite3 database. It can process about 2000 transactions per second, which is about a bazillion times faster than the perl script
As my code got more complicated, I realised I needed to use a proper version control system instead of just saving copies of files as foo.BAK, foo.BAK2 ... so I learned Git. Git is actually quite accessible and definitely worth learning.
Packaging
So, at this point my code was on Github and it worked, but I doubted very much whether anyone would find it and use it. Seriously... in 2015, you shouldn't have to compile a program yourself unless you're actually developing it.
Packaging my code for Ubuntu/Debian turned out to be almost as difficult as writing the damn program!
I started by learning the GNU build system, Autotools, to replace my handwritten Makefile with a more flexible one. Autotools is the group of programs that are used in the classic "configure, make, make install" procedure to check dependencies and create a makefile that installs everything to the correct place on your system and removes them cleanly again afterwards.
Autotools turned out to be a nightmare. It is not at all easy to learn - something as simple as testing for C++11 support in the compiler and setting the appropriate flag should be easy, but it's not, and requires the use of some pretty archaic m4 macros. The documentation is sparse, and non-trivial example tutorials are hard to come by.
In defence of Autotools, once it is set up, the "configure, make, install" procedure is easy to do - I can see why it was good in the days when end users were required to compile software. It also provides some nice features like "make dist", which creates a .tar.gz source archive for distribution - useful for starting a .deb! If I could start over, I think I would learn cmake, which is supposed to be easier.
Once this was all sorted, I set to work building a .deb package. The Ubuntu documentation can pretty much be summed up in one sentence:
After battling my way through the Debian new maintainer's guide, I finally produced a package. However, the package checker lintian kicked up a load of errors. Some were trivial like line lengths in the package description, but others were more serious. A manual file was missing.
Yet another unpleasant surprise: manual files are written using nroff, a markup language even more difficult to learn than TeX, and a lot less useful. Luckily, it was possible to justplagiarise borrow a lot of the markup from other manfiles, which are stored in /usr/share/man/man1/foo.1.gz. Take a look at a few using the "zless" command, and you'll see why I wasn't enthused at the prospect of writing one from scratch.
Distribution
Package completed, error free, it was time to upload to a PPA.
The next surprise was that you can't just create a .deb, sign it, and upload it to a PPA. Launchpad builds the binaries itself from a source archive!
This is a great for quality control, since the packages are built in a sanitised chroot. In fact, this caught a few of my errors, like missing libsqlite3-dev and libboost-regex-dev from the build-depends field in the control file. These libraries were (obviously) installed on my laptop already, but they weren't present in the chroot, so the compiler failed during linking.
After a bit of trial and error, I got Launchpad to build my app successfully
My PPA is here:
https://launchpad.net/~sam-hobbs/+ar...elisting-tools
At the moment, packages are only available for 14.10 Utopic Unicorn.
If you fancy helping me out (I'd really appreciate it!), you can add the PPA, install the package, test it and remove it.
The package is called ams-whitelisting-tools because I plan on adding other utilities to the package later.
Obligatory warning: in general, you shouldn't add random PPAs to your system. Only do this if you trust me!
The following code will add the PPA, install my utility, do some basic tests, and then remove the package and the PPA.
If you have modsecurity installed, you can also run this command to generate a database from your audit log file:
The utility is still very much in development, but it is at the stage now where it could be useful to people and I'm very pleased to be able to release something!
If you do any serious testing, I'd love to hear some feedback. I'm aware that I need to tighten up the --force and --quiet options, which were added recently.
Thanks
I only started using Linux a couple of years ago, and have learned a huge amount since then, thanks mainly to this forum and the patient help I have received here.
Thanks especially to Steve for getting me interested in ModSecurity, and GreyGeek for encouraging me want to learn C++!
Even if you don't use Apache, you might find this interesting. To create my app I had to learn about C++ development on Ubuntu including two third party libraries (Boost Regex and SQLite), version control using Git, the GNU build system "Autotools", how to package software for Ubuntu and Debian, and how to upload packages to a Personal Package Archive (PPA) on Launchpad.
I hope this will spark some interesting discussion. I'd love to hear other people's experiences with any of the above, particularly ubuntu development.
--------------------------------------------------------------------------------------
What is ModSecurity
You may recall some conversations we had before on this forum about Apache2's security module "ModSecurity".
For those of you who haven't used it, ModSecurity is a Web Application Firewall that can be used with a set of rules to "enumerate badness" and decide when to block requests sent to the server. It sits inbetween Apache and the web applications running on the server, and can therefore intercept malicious requests before they are processed by the app. Probably the most common set of rules is the Open Web Application Security Project's Core Rule Set (OWASP CRS), which is in the ubuntu repos as modsecurity-crs.
Here's a typical example: Mr Naughty is trying to hack example.com, a website running a vulnerable installation of WordPress on a LAMP server. Mr Naughty is trying to use an SQL injection attack to create a new admin user in the database so that he can deface the site, steal data etc. However, ModSecurity identifies the SQL injection attack contained in the POST variable sent by Mr Naughty and blocks it before it is executed by Wordpress. The attack fails
Sounds great, right?
Why isn't is more popular?
I started learning about ModSecurity after Steve recommended it to me, about a year and a half ago. As an enthusiastic but inexperienced amateur, I really struggled to configure it properly - each rule is using pattern matching to decide what to block, and there are inevitable false positives.
This means you can't just install it and expect it to work, typically you run ModSecurity in "detection only" mode for a time (rules are evaluated but ModSecurity doesn't actually block anything), and then inspect the audit logs to identify where you need to make amendments to the rules to remove those false positives.
The audit log is a text file with sections for each part of the transaction: the data sent to the server, the response sent back, and any rules that were matched. Since the data for each transaction is split over multiple lines, it does not lend itself to being sorted with simple utilities like grep. Identifying all of the requests from a certain IP address that triggered a given rule is a non-trivial exercise.
Initial Solutions
My first attempt at tackling the problem was to remove the rules that were being triggered at certain locations. To do this I wrote a BASH script, which you can find with a description on my website. The script doesn't look at the audit log file, it just uses the error messages ModSecurity writes to the apache log, and spits out a virtualhost configuration file listing locations (URLs) where certain rules are disabled.
This would work OK if you were running ModSecurity in "traditional" mode, where any rule that is matched results in the request being blocked, but it isn't good for the new anomaly scoring mode (the one that enumerates badness). In the anomaly scoring mode, each rule has a point score and the request is blocked if the score passes over a threshold... I soon realised that my script above was actually just removing the rule that adds up the scores and blocks the request, when it should have been removing the individual rules!
This wasn't good enough. I realised I needed a more fine-tuned approach, so I learned some Perl. Perl can do multiline regex (slowly!), which enabled me to look at the audit log instead of the error log. The perl script I wrote splits the audit log into bits and puts it into a spreadsheet. This is the same fundamental approach as my C++ app, but the spreadsheet quickly becomes extremely sluggish, and the script takes ages to run. It does work, though!
The Solution: auditlog2db commandline utility
So, after my partial success with Perl I decided I needed something serious to tackle the problem. I had read that C++ apps are generally faster than scripting languages like Perl, and wanted to learn the language that our OS is written in. I had an idea that a sqlite database would be a good way to store the information from the audit logs so that it could be sorted quickly, but I didn't know any C++ or anything about sqlite.
I learned:
- Some basic C++ (hair-tearingly frustrating at times but ultimately rewarding)
- How to use the C/C++ sqlite API (reasonably well documented but very confusing to someone writing their first C++ app)
- How to do regular expression matching in C++ using the Boost Regex library (much more difficult than perl!)
- How to use a Makefile to make compilation less tedious.
The result is a C++ commandline utility called auditlog2db that will import the logfile into a sqlite3 database. It can process about 2000 transactions per second, which is about a bazillion times faster than the perl script
As my code got more complicated, I realised I needed to use a proper version control system instead of just saving copies of files as foo.BAK, foo.BAK2 ... so I learned Git. Git is actually quite accessible and definitely worth learning.
Packaging
So, at this point my code was on Github and it worked, but I doubted very much whether anyone would find it and use it. Seriously... in 2015, you shouldn't have to compile a program yourself unless you're actually developing it.
Packaging my code for Ubuntu/Debian turned out to be almost as difficult as writing the damn program!
I started by learning the GNU build system, Autotools, to replace my handwritten Makefile with a more flexible one. Autotools is the group of programs that are used in the classic "configure, make, make install" procedure to check dependencies and create a makefile that installs everything to the correct place on your system and removes them cleanly again afterwards.
Autotools turned out to be a nightmare. It is not at all easy to learn - something as simple as testing for C++11 support in the compiler and setting the appropriate flag should be easy, but it's not, and requires the use of some pretty archaic m4 macros. The documentation is sparse, and non-trivial example tutorials are hard to come by.
In defence of Autotools, once it is set up, the "configure, make, install" procedure is easy to do - I can see why it was good in the days when end users were required to compile software. It also provides some nice features like "make dist", which creates a .tar.gz source archive for distribution - useful for starting a .deb! If I could start over, I think I would learn cmake, which is supposed to be easier.
Once this was all sorted, I set to work building a .deb package. The Ubuntu documentation can pretty much be summed up in one sentence:
Build a package as you would for debian, but use the ubuntu release codename (utopic) instead of the Debian codename (unstable) in the changelog file.
Yet another unpleasant surprise: manual files are written using nroff, a markup language even more difficult to learn than TeX, and a lot less useful. Luckily, it was possible to just
Distribution
Package completed, error free, it was time to upload to a PPA.
The next surprise was that you can't just create a .deb, sign it, and upload it to a PPA. Launchpad builds the binaries itself from a source archive!
This is a great for quality control, since the packages are built in a sanitised chroot. In fact, this caught a few of my errors, like missing libsqlite3-dev and libboost-regex-dev from the build-depends field in the control file. These libraries were (obviously) installed on my laptop already, but they weren't present in the chroot, so the compiler failed during linking.
After a bit of trial and error, I got Launchpad to build my app successfully
My PPA is here:
https://launchpad.net/~sam-hobbs/+ar...elisting-tools
At the moment, packages are only available for 14.10 Utopic Unicorn.
If you fancy helping me out (I'd really appreciate it!), you can add the PPA, install the package, test it and remove it.
The package is called ams-whitelisting-tools because I plan on adding other utilities to the package later.
Obligatory warning: in general, you shouldn't add random PPAs to your system. Only do this if you trust me!
The following code will add the PPA, install my utility, do some basic tests, and then remove the package and the PPA.
Code:
sudo add-apt-repository ppa:sam-hobbs/ams-whitelisting-tools sudo apt-get update sudo apt-get install ams-whitelisting-tools man auditlog2db auditlog2db --version sudo apt-get remove --purge ams-whitelisting-tools sudo add-apt-repository --remove ppa:sam-hobbs/ams-whitelisting-tools
Code:
auditlog2db -i /var/log/apache2/modsec_audit.log -o ~/modsecurity.db
If you do any serious testing, I'd love to hear some feedback. I'm aware that I need to tighten up the --force and --quiet options, which were added recently.
Thanks
I only started using Linux a couple of years ago, and have learned a huge amount since then, thanks mainly to this forum and the patient help I have received here.
Thanks especially to Steve for getting me interested in ModSecurity, and GreyGeek for encouraging me want to learn C++!
Comment