Filtering the content with sed

1.1 Introduction
1.2 Configuration
   1.2.1 Loading a module
   1.2.2 Adding a filter
      1.2.2.1 Adding a output filter
      1.2.2.2 Adding a input filter
   1.2.3 Sed scripts
   1.2.4 Removing a filter
   1.2.5 Examples
1.3 Interfaces
   1.3.1 Exported Interfaces
   1.3.2 Apache version requirement

1.1 Introduction

Sed is a line oriented editor. It is commonly used for search and replace functionality. In Apache, traditionally sed is used as an external content filter.

# mod_ext_filter directive to define a filter which
# replaces text in the response
ExtFilterDefine external_sed mode=output intype=text/html cmd="/bin/sed s/california/CA/g"

<Location />
SetOutputFilter external_sed
</Location>

In the above example external process /bin/sed is used to replace the string "california" to "CA". For every filter invocation, a new process "/bin/sed" is created which takes input on standard input and produces the filtered content on standard output. The above technique works fine but it doesn't perform well. Process creation is very costly for every request. Process creation on multithreaded process might be even more costlier. Also sed may not be available on all platforms.

mod_sed is a in-process content filter. The filters implement the sed edit commands implemented by Solaris 10 sed program as described in man page. However unlike sed, mod_sed doesn't take data from standard input. Instead filter act on the entity data sent between client and server. mod_sed can be used as a input or output filter. mod_sed is a content filter which means that it can not be used to modify client or server http headers.

mod_sed output filter accept a chunk of data and execute the sed scripts on data and generates the output which is passed to next filter in the filter chain.

mod_sed input filter reads the data from next filter in filter chain and executes the sed scripts and returns the generated data to caller filter in the filter chain.

Both input and output filter only process the data if new line character is seen in the content. At the end of the data, rest of the data is treated as last line.

1.2 Configuration

1.2.1 Loading a module

        Apache modules can be compiled statically into the httpd process or separate dynamic shared object (dso). If mod_sed is compiled as a dso then it needs to be loaded using Apache's LoadModule directive before it can be used e.g
LoadModule sed_module modules/mod_sed.so

1.2.2 Adding a filter

        Sed filter could be inserted by Apache directives AddInputFilter or AddOutputFilter (or by SetInputFilter/SetOutputFilter) or similar directives.

1.2.2.1 Adding a output filter

Following directive can be used to add a Sed output filter to html files.
AddOutputFilter Sed <extension>
e.g
AddOutputFilter Sed html

1.2.2.2 Adding a input filter

Following directive can be used to add a Sed input filter to php scripts.
AddInputFilter Sed <extension>
e.g
AddInputFilter Sed php

1.2.3 Sed scripts

Sed scripts can be provided for input and output filter by InputSed and OutputSed mod_sed directives respectively.   For example :
OutputSed "script"
OutputSed "script"

InputSed and OutputSed takes single argument. These directives can be used multiple times to specify multiple scripts. These sed scripts will get executed in the order of their appearance in configuration file.

1.2.4 Removing a filter

Sed filter could be removed by RemoveOutputFilter or RemoveInputFilter Apache directives.

1.2.5 Examples

Adding a output filter :
<Directory "/var/www/docs/sed">
   AddOutputFilter Sed html
   OutputSed "s/california/CA/g"
   OutputSed "s/washington/WA/g"

</Directory>

The above example will replace the string "california" to "CA" and the string "washington" to WA in html document before sending to client.

Adding an input filter :
<Directory "/var/www/docs/sed">
    AddInputFilter Sed php
    InputSed "s/california/CA/gi"
</Directory>

In the above example, the string "california" will be replaced to "CA" in any request data provided to php scripts.

Removing a output filter :
<Directory "/var/www/docs/sed/subdir1">
    RemoveOutputFilter Sed html
</Directory>

1.3 Interfaces

1.3.1 Exported Interfaces

Interface Classification Context Description
Sed Filter name. Filter handling directives Filter can be added using several possible Apache directives. Name of the filter tells Apache which filter to add. It can be used for both input and output filter.
sed_module Module name - If mod_sed is compiled as a dso, module name must be provided as an argument to LoadModule
OutputSed Directive directory Directive to specify the multiple sed scripts to output filter.
InputSed Directive directory Directive to specify the multiple sed scripts to input filter.

1.3.2 Apache version requirement :

        Module will work from Apache 2.2.x and above.