Date: Sat, 30 Mar 2002 18:20:06 -0500 (EST)
From: Stan Brown <stan@oakroadsystems.com>
Subject: Re: PCRE question: word boundaries and word characters

>On Sat, 23 Mar 2002, Stan Brown wrote:
>> Is there any way to alter the meaning of a "word" character,

>Yes, but only provided you are prepared to read some of the code and
>understand what's going on.

>1. Read the documentation about the pcre_maketables() function.
...
>4. Write code to modify the tables that it creates, before passing them to
>pcre_compile().

>If you do it that way, you can do different things in different
>circumstances, and you have not modified the code of PCRE itself, so you
>don't have to maintain a patch for different releases.

That's the approach I have taken. I append the code. It's not
terribly exciting, but reusing it might be marginally easier than
writing fresh code. :-) Plerase feel free to include it, omit it,
or include an altered version with your next PCRE or in any other
way you see fit.

-- 
Regards,
Stan Brown, Oak Road Systems, Cortland County, NY, USA
                             http://oakroadsystems.com
                        mailto:stan@oakroadsystems.com

Source file worddefine.c follows:

/*******************************************************************************

pcre_worddefine( ): redefine "word" characters for use with PCRE

usage: pcre_worddefine(tables, charblock)

tables is the return from pcre_maketables( ). Caution: that is a pointer to
       const, so you will have to cast it to call pcre_worddefine

spec is an unsigned char array (effectively 256 bits) with each bit set, or not,
       according to whether the corresponding character is a "word" character

Example:
    unsigned char block[256/8];
    memset(block, 0, sizeof block);
    for (i=0; i<256; ++i) 
        if (isgraaph(i))
            block[ i/8 ] |= 1 << (i&7);
    pcre_worddefine(tables, block);
        
2002-03-26  new program;  author: Stan Brown, Oak Road Systems

               Copyright 2002       Stan Brown, Oak Road Systems
                           http://oakroadsystems.com


PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. It
was written by Philip Hazel <ph10@cam.ac.uk>.

Permission is granted to anyone to use this software for any purpose on any
computer system, and to redistribute it freely, subject to the following
restrictions:

1. This software is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2. The origin of this software must not be misrepresented, either by
   explicit claim or by omission.

3. Altered versions must be plainly marked as such, and must not be
   misrepresented as being the original software.

4. If PCRE is embedded in any software that is released under the GNU
   General Purpose Licence (GPL), then the terms of that licence shall
   supersede any condition above with which it is incompatible.

*******************************************************************************/
#include "internal.h"


void
pcre_worddefine(
        unsigned char *tables,
        const unsigned char *charblock)
{

    int i;
    unsigned char *p;

    /* 1. Copy 'charblock' to the table of "word" characters. */

    memcpy(tables+cbits_offset+cbit_word, charblock, 256/8);

    /* 2. Update "word"-character bits the character type table. */

    p = tables + ctypes_offset;
    for (i = 0; i < 256; i++) {
        if ( charblock[i/8] & (1 << (i&7)) )
            *p++ |= ctype_word;
        else
            *p++ &= (~ctype_word);
    }
}

/* end of wordtables.c */

