While reviewing some Apache log files the other day I started to wonder if somebody had already come up with a way to detect common attack characteristics by the information they contain. Although searching for entries containing “SELECT,” “xp_cmdshell,” and other attack terms can turn up useful information it is difficult to cover all of the attack types and the different ways they can be represented within the log entries. Some Googling brought me to a project started by Romain Gaucher called apache-scalp which is hosted on Google Code.
Scalp! is a log analyzer for the Apache web server that aims to look for security problems. The main idea is to look through huge log files and extract the possible attacks that have been sent through HTTP/GET (By default, Apache does not log the HTTP/POST variable).
This tool uses an event filter file created by and for the PHPIDS project. PHPIDS was created parse web traffic and alert on anomalous activity as it occurs instead of having to parse the log files. To help in those instances that PHPIDS is not deployed, Scalp applies the same principle to the collected information. Utilizing the PHPIDS signature file, default_filter.xml, Scalp will generate an alert file in the user’s choice of text, HTML, and XML formats. This output, however, is alert data only and can be very cumbersome to review. When a lot of information is involved human review is not a very effective way to determine trends and other specifics about the data.
<?xml version=”1.0″ encoding=”utf-8″?>
<!–
File created by Scalp! by Romain Gaucher - http://code.google.com/p/apache-scalp
Apache log attack analysis tool based on PHP-IDS filters
–>
<scalp file=”apache_log” time=”Sat-27-Dec-2008″>
<attack type=”xss” name=”Cross-Site Scripting”>
<impact value=”5″>
<item>
<reason><![CDATA[Detects JavaScript with(), ternary operators and XML predicate attacks]]></reason>
<regexp><![CDATA[(?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)]]></regexp>
<line><![CDATA[xxx.28.xxx.249 - - [26/Aug/2008:00:00:13 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/na/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.0″ 200 1547
]]></line>
</item>
<item>
<reason><![CDATA[Detects JavaScript with(), ternary operators and XML predicate attacks]]></reason>
<regexp><![CDATA[(?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)]]></regexp>
<line><![CDATA[xxx.16.xxx.158 - - [26/Aug/2008:00:00:19 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/na/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.1″ 200 1570
]]></line>
</item>
<item>
<reason><![CDATA[Detects JavaScript with(), ternary operators and XML predicate attacks]]></reason>
<regexp><![CDATA[(?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)]]></regexp>
<line><![CDATA[xxx.84.xxx.90 - - [26/Aug/2008:00:00:41 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/vie/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.1″ 200 1568
]]></line>
</item>
<item>
<reason><![CDATA[Detects JavaScript with(), ternary operators and XML predicate attacks]]></reason>
<regexp><![CDATA[(?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)]]></regexp>
<line><![CDATA[xxx.160.xxx.243 - - [26/Aug/2008:00:00:57 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/par/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.1″ 200 1573
To help identify trends and other interesting information associated with alerts generated by Scalp I have put together a external parser to generate a readable report. The Scalp External XML Reporter (SEXR) takes the information within a Scalp XML file and produces several outputs to either standard out or a text file.
The first output is the full parse of the generated alerts. Although this output can still contain a lot of information, it is much easier to read and understand than the original Scalp output. (The following output, as well as the rest in this post, have been snipped for brevity and their format slightly altered getting the text into this post.)
sexr.py: Conducting full scan of 1 files
scalp: {’file’: ‘apache_log’, ‘time’: ‘Sat-27-Dec-2008′}
attack: {’type’: ‘xss’, ‘name’: ‘Cross-Site Scripting’}
impact: {’value’: ‘5′}
item
reason
- Detects JavaScript with(), ternary operators and XML predicate attacks
regexp
- (?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)
line
- xxx.28.xxx.249 - - [26/Aug/2008:00:00:13 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com /publish/01/na/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer- 4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.0″ 200 1547item
reason
- Detects JavaScript with(), ternary operators and XML predicate attacks
regexp
- (?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)
line
- xxx.16.xxx.158 - - [26/Aug/2008:00:00:19 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/na/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.1″ 200 1570item
reason
- Detects JavaScript with(), ternary operators and XML predicate attacks
regexp
- (?:with\([^)]*\)\))|(?:\.\s*source\W)|(?:\?[^:]+:[^;]+;)
line
- xxx.84.xxx.90 - - [26/Aug/2008:00:00:41 -0700] “GET /d.AuthenticateUser1?p_page=http://webx.companyX.com/publish/01/vie/en.html&p_HTTP_USER_AGENT=Microsoft%20Internet%20Explorer-4.0%20(compatible;%20MSIE%206.0;%20Windows%20NT%205.1;%20SV1;%20.NET%20CLR%201.1.4322) HTTP/1.1″ 200 1568
Although helpful and informative it might be a little more helpful to know more specific information about the alerts. SEXR’s count scan option is designed to remove some of the more extraneous information by providing the number of alerts detected and the names of the alerts with which they are associated.
sexr.py: Conducting count scan of 1 files
scalp: {’file’: ‘apache_log’, ‘time’: ‘Sat-27-Dec-2008′}
attack: {’type’: ‘xss’, ‘name’: ‘Cross-Site Scripting’}
Impact 5 Items: 299
- ‘Detects JavaScript with(), ternary operators and XML predicate attacks’: 248
- ‘Detects self-executing JavaScript functions’: 51
Impact 4 Items: 655
- ‘Detects common XSS concatenation patterns 1/2′: 655
Impact 3 Items: 80
- ‘Detects common comment types’: 80
attack: {’type’: ‘lfi’, ‘name’: ‘Local File Inclusion’}
Impact 5 Items: 199
- ‘Detects specific directory and path traversal’: 199
attack: {’type’: ‘rfe’, ‘name’: ‘Remote File Execution’}
Impact 5 Items: 383
-’Detects url injections and RFE attempts’: 383
sexr.py: Done
After determining the types of attacks that were detected from the Apache log files it might be interesting to know where these attacks originated. SEXR’s source IP scan option provides the source IP address for all of the alerts detected and a count of how many times these source IP addresses were associated with the attack.
sexr.py: Conducting IP scan of 1 files
scalp: {’file’: ‘apache_log’, ‘time’: ‘Sat-27-Dec-2008′}
attack: {’type’: ‘xss’, ‘name’: ‘Cross-Site Scripting’}
Impact 5 Items: 299
- Total Source IP Addresses: 209
- xxx.176.xxx.42: 1
- xxx.177.xxx.68: 1
- xxx.129.xxx.3: 1
- xxx.89.xxx.250: 1
- xxx.179.xxx.235: 1
- xxx.253.xxx.222: 3
Although Scalp does all of the heavy lifting I believe that SEXR can play an important role in evaluating the results that Scalp produces. I hope that SEXR will help make the information provided by Scalp more usable for system, network, application administrators and security professionals. To this end Romain has allowed me to update a few portions of Scalp to make its XML output more informative while also adding SEXR and the Scalp DTD file to the source code available online with the Scalp project.
If you have any comments, recommendations, or updates for the code, please let me know.
Go forth and do good things,
Don C. Weber








