issrg.utils
Class RFC2253NameParser

java.lang.Object
  extended by issrg.utils.RFC2253NameParser

public class RFC2253NameParser
extends java.lang.Object

This is an implementation of an RFC2253 LDAP DN parser. Many existing parsers are burdained with semantics interpretation, so the parsers look after the attribute type names and OIDs. This is not so useful in this developing world. Many people invent their own attributes, just for their own LDAP directory, and such parsers will obviously fail. However, it would be so simple to just return the DN divided into RDNs, each of them separated into AVAs with attribute value represented as an unescaped String or a binary value. Let the caller cope with the attribute type names and OIDs!

Note also, that due to a common misconception people often join RDNs, separating them with comma-space, instead of just comma, as is specified in the RFC.

Also I want to share some ambiguity I find in the said RFC (not the only ambiguity found, though!): it says to join the RDNs in reverse order, but it does not tell to parse them in the reverse order; which means that the DN will be reversed after each parse-compose operation.

This parser supports OSF-syntax DNs as well, but not all of such DNs can be represented in RFC2253 form (a valid OSF DN "/C=gb/etc" cannot be converted into RFC2253 DN, because the last component doesn't have an attribute type).

Author:
A.Otenko

Field Summary
static char ASSIGN_CHAR
           
static char COMMA_CHAR
           
static char PLUS_CHAR
           
 
Constructor Summary
RFC2253NameParser()
           
 
Method Summary
protected static boolean ALPHA(char c)
          The ALPHA terminal.
protected static boolean ASSIGNMENT(char c)
          The ASSIGNMENT terminal.
protected static java.lang.String attributeType(java.text.CharacterIterator ci, boolean OSF)
          The attributeType non-terminal.
protected static java.lang.String[] attributeTypeAndValue(java.text.CharacterIterator ci, boolean OSF)
          The attributeTypeAndValue non-terminal.
protected static java.lang.String attributeValue(java.text.CharacterIterator ci, boolean OSF)
          The attributeValue non-terminal.
protected static boolean COMMA(char c)
          The COMMA terminal.
protected static boolean DIGIT(char c)
          The DIGIT terminal.
static java.lang.String[][][] distinguishedName(java.lang.String Name)
          The starting non-terminal, distinguishedName.
static java.lang.String escapeString(java.lang.String s)
          Same as escapeString( s, true );
static java.lang.String escapeString(java.lang.String s, boolean escape)
          This routine gets a Unicode String on input, and converts any character, that is outside latin alphabet and numbers, to hexpair, and escapes all special characters.
protected static boolean hexchar(char c)
          The hexchar terminal.
protected static java.lang.String hexpair(java.text.CharacterIterator ci)
          The hexpair non-terminal.
protected static java.lang.String hexstring(java.text.CharacterIterator ci)
          The hexstring non-terminal.
protected static boolean keychar(char c)
          The keychar terminal.
protected static java.lang.String[][] name_component(java.text.CharacterIterator ci, boolean OSF)
          The name-component non-terminal.
protected static java.lang.String[][][] name(java.text.CharacterIterator n, boolean OSF)
          The name non-terminal.
protected static java.lang.String oid(java.text.CharacterIterator ci)
          The oid non-terminal.
protected static int onePair(java.text.CharacterIterator ci, boolean OSF)
           
protected static char pair(java.text.CharacterIterator ci, boolean OSF)
          The pair non-terminal.
protected static boolean PLUS(char c)
          The PLUS terminal.
protected static boolean QUOTATION(char c, boolean OSF)
          The QUOTATION terminal.
protected static boolean quotechar(char c, boolean OSF)
          The quotechar terminal.
protected static void skip_spaces(java.text.CharacterIterator ci)
          The skip_spaces() non-terminal.
protected static boolean SLASH(char c)
          The SLASH terminal.
protected static boolean special(char c, boolean OSF)
          The special terminal.
protected static java.lang.String string(java.text.CharacterIterator ci, boolean OSF)
          The string non-terminal.
protected static boolean stringchar(char c, boolean OSF)
          The stringchar terminal.
static java.lang.String toCanonicalDN(java.lang.String dn)
          This method will attempt to convert a given DN to canonical DN.
static java.lang.String toCanonicalDN(java.lang.String[][][] dn)
          Same as toCanonicalDN( dn, true );
static java.lang.String toCanonicalDN(java.lang.String[][][] dn, boolean escape)
          This method returns the canonical representation of the DN separated into arrays of strings, optionally escaping non-ASCII characters.
static java.lang.String toHexString(byte[] b)
          This routine converts a given byte array into a hexstring, prepended with a HASH_CHAR.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ASSIGN_CHAR

public static final char ASSIGN_CHAR
See Also:
Constant Field Values

COMMA_CHAR

public static final char COMMA_CHAR
See Also:
Constant Field Values

PLUS_CHAR

public static final char PLUS_CHAR
See Also:
Constant Field Values
Constructor Detail

RFC2253NameParser

public RFC2253NameParser()
Method Detail

distinguishedName

public static java.lang.String[][][] distinguishedName(java.lang.String Name)
                                                throws RFC2253ParsingException
The starting non-terminal, distinguishedName.

distinguishedName = [name]

Parameters:
Name - a string to be parsed into a Disitnguished Name
Returns:
returns an array of arrays of AVA, each of them being an array of two strings, the first being the name or the OID of the attribute, as specified in the DN, and the second being the value of the attribute; see also toCanonicalName method. Note also, that the method simply returns an array of arrays of AttributeValueAssertions. The latter is an array of two strings: attribute type and attribute value; not to mess with extra classes.
Throws:
RFC2253ParsingException, - which will always contain a nested exception. The exception contains basic information: at what position the parsing error occured, and there is not much use in printing its stack. The details of the fault are contained in the nested exception, and the actual error point in the code as well, if you want to print the stack trace. The code does not throw any other exceptions, even run-time ones, except for IllegalArgumentException, which may occur in case a null string is passed as a Name parameter.
RFC2253ParsingException

toCanonicalDN

public static java.lang.String toCanonicalDN(java.lang.String[][][] dn,
                                             boolean escape)
This method returns the canonical representation of the DN separated into arrays of strings, optionally escaping non-ASCII characters.

It simply combines the parts of the dn in the following way: the attribute types in each AVA are converted to upper case (because some applications allow lowercase input of these), attribute values are taken as is, and then all AVAs in the same RDN are combined using "=", after that all RDNs are concatenated using ",".

dn is an array of RDNs. RDN is an array of AVA (Attribute Value Assertion). AVA is an array of two strings (after parsing using distinguishedName method). The string with index 0 is the attribute type. The string with index 1 is the attribute value. The value will be escaped using escapeString method.

If AVA is an array of more than two strings (the reference in RDN can be replaced by the user), and the string with index 2 is not null, it will be placed as the attribute value as is, instead of escaped string with index 1. This allows the user to provide the values of attributes that should be compared as binary (for example, "#" would match only this string of hexadecimal values).

String [][][] dn = distinguishedName("uid=aBc , c=gb");
String [][] uid_rdn = dn[0];
String [] uid_ava = uid_rdn[0];

uid_rdn[0] = new String[]{ uid_ava[0], uid_ava[1], toHexString(uid_ava[1].getBytes()) };

String canonicalDN = toCanonicalDN(dn);

In the example above the distinguished name will be parsed as an array of two RDNs, each of them having only one AVA. We are accessing the leftmost RDN. We are interested in replacing the AVA in it that corresponds to the "uid" attribute type, so the uid will be case sensitive ("aBc" is not the same as "ABC"). So we replace the corresponding AVA in the RDN with the new value - an array containing the user-defined string to be put in the RDN.

Note that in most cases conversion to the canonic DN will look like this:

String canonicalDN = toCanonicalDN(distinguishedName( nonCanonical ));

Parameters:
dn - is the parsed DN with the array format as described above
escape - if true, non-ASCII characters will be escaped as a hex pair; if false, the Unicode values will be returned as is
Returns:
String value containing the canonical RFC2253 DN
Throws:
java.lang.NullPointerException - and IndexOutOfRange in case the dn is a malformed input (AVA is less than 2 elements, or null pointer encountered)

toCanonicalDN

public static java.lang.String toCanonicalDN(java.lang.String[][][] dn)
Same as toCanonicalDN( dn, true );


toCanonicalDN

public static java.lang.String toCanonicalDN(java.lang.String dn)
This method will attempt to convert a given DN to canonical DN. If it is not a DN, it will return null.

This is the same as calling toCanonicalDN(distinguishedName(dn)), but is more convenient, because it doesn't throw exceptions.

Parameters:
dn - - the DN to convert to canonical form; can be null
Returns:
the canonical representation of the DN, or null, if it is not a DN.

name

protected static java.lang.String[][][] name(java.text.CharacterIterator n,
                                             boolean OSF)
                                      throws RFC2253ParsingException
The name non-terminal.

Parameters:
n - - the CharacterIterator where the current position points to a distinguished name
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
a DN array of RDN arrays of AVA arrays
Throws:
RFC2253ParsingException

name_component

protected static java.lang.String[][] name_component(java.text.CharacterIterator ci,
                                                     boolean OSF)
                                              throws RFC2253ParsingException
The name-component non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to a RDN
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
an RDN array of AVA arrays
Throws:
RFC2253ParsingException

attributeTypeAndValue

protected static java.lang.String[] attributeTypeAndValue(java.text.CharacterIterator ci,
                                                          boolean OSF)
                                                   throws RFC2253ParsingException
The attributeTypeAndValue non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to an AVA
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
an AVA array (two elements: at 0 - attribute type, at 1 or 2 - attribute value)
Throws:
RFC2253ParsingException

attributeType

protected static java.lang.String attributeType(java.text.CharacterIterator ci,
                                                boolean OSF)
                                         throws RFC2253ParsingException
The attributeType non-terminal. Seems, there is a typo in the RFC:

attributeType = (ALPHA 1*keychar) / oid

should perhaps read

attributeType = (ALPHA *keychar) / oid

Otherwise, attributeType L (locality) would not be accepted.

Parameters:
ci - - the CharacterIterator, where the current position points to an attribute type
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
an attribute type as an "oid." or the type
Throws:
RFC2253ParsingException

keychar

protected static boolean keychar(char c)
The keychar terminal. Someone could call it a non-terminal, but since it represents the smallest granularity of the input, it is a terminal.

Parameters:
c - - the character to test
Returns:
true, if it is a keychar (HYPHEN_CHAR or ALPHA(c) or DIGIT(c))

oid

protected static java.lang.String oid(java.text.CharacterIterator ci)
                               throws RFC2253ParsingException
The oid non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to an attribute type expressed as an OID
Returns:
the OID string without "oid." prefix
Throws:
RFC2253ParsingException

attributeValue

protected static java.lang.String attributeValue(java.text.CharacterIterator ci,
                                                 boolean OSF)
                                          throws RFC2253ParsingException
The attributeValue non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to an attribute value
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
attribute value
Throws:
RFC2253ParsingException

string

protected static java.lang.String string(java.text.CharacterIterator ci,
                                         boolean OSF)
                                  throws RFC2253ParsingException
The string non-terminal. The specification is not quite clear about the trailing space characters. Is it still possible to have a value '\ hi hix\ '? I am implementing it as if it were possible, though, the syntax does not talk about escaping space in such a way.

Parameters:
ci - - the CharacterIterator, where the current position points to a string value of an attribute
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
unescaped string value
Throws:
RFC2253ParsingException

quotechar

protected static boolean quotechar(char c,
                                   boolean OSF)
The quotechar terminal.

Parameters:
c - - the character to be tested
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
true, if c is a quotechar as defined in RFC2253; false otherwise

special

protected static boolean special(char c,
                                 boolean OSF)
The special terminal.

Parameters:
c - - the character to be tested
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
true, if c is a special as defined in RFC2253; false otherwise

pair

protected static char pair(java.text.CharacterIterator ci,
                           boolean OSF)
                    throws RFC2253ParsingException
The pair non-terminal. Note that it also allows to escape a SPACE_CHAR, to be consistent with the DN-to-string conversion rules, that say that I have to allow the last space to be escaped.

It may read multiple hex pairs escaped with "\" to fully decode the UTF-8 character.

Parameters:
ci - - the CharacterIterator, where the current position points to a character expressed through the escape character "\" and the character code
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
unescaped character
Throws:
RFC2253ParsingException, - and restores the ci pointer to the position it was on input; thus acting similar to the terminals: a pointer can move over the whole entity, or it does not move at all.
RFC2253ParsingException

onePair

protected static int onePair(java.text.CharacterIterator ci,
                             boolean OSF)
                      throws RFC2253ParsingException
Throws:
RFC2253ParsingException

stringchar

protected static boolean stringchar(char c,
                                    boolean OSF)
The stringchar terminal.

Parameters:
c - - the character to be tested
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
true, if c is a stringchar as defined in RFC2253; false otherwise

hexstring

protected static java.lang.String hexstring(java.text.CharacterIterator ci)
                                     throws RFC2253ParsingException
The hexstring non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to a value expressed as a hexstring
Returns:
unescaped hexstring, where each character has the code corresponding to the hexstring
Throws:
RFC2253ParsingException

hexpair

protected static java.lang.String hexpair(java.text.CharacterIterator ci)
                                   throws RFC2253ParsingException
The hexpair non-terminal.

Parameters:
ci - - the CharacterIterator, where the current position points to a single hexadecimal digits pair
Throws:
RFC2253ParsingException, - but like pair non-terminal, restores the pointer to the initial position.
RFC2253ParsingException

hexchar

protected static boolean hexchar(char c)
The hexchar terminal.

Parameters:
c - - the character to be tested
Returns:
true, if c is a hexchar as defined in RFC2253; false otherwise

DIGIT

protected static boolean DIGIT(char c)
The DIGIT terminal.

Parameters:
c - - the character to be tested
Returns:
true, if c is a digit as defined in RFC2253; false otherwise

ALPHA

protected static boolean ALPHA(char c)
The ALPHA terminal.

Parameters:
c - - the character to be tested
Returns:
true, if c is an alpha as defined in RFC2253; false otherwise

QUOTATION

protected static boolean QUOTATION(char c,
                                   boolean OSF)
The QUOTATION terminal.

Parameters:
c - - the character to be tested
OSF - - if true, OSF syntax is assumed; if false, RFC2253 syntax is assumed
Returns:
true, if c is a quotation as defined in RFC2253 or OSF; false otherwise

skip_spaces

protected static void skip_spaces(java.text.CharacterIterator ci)
The skip_spaces() non-terminal.

skip_spaces = *space

Parameters:
ci - - the CharacterIterator, where the current position points to sequence of spaces

COMMA

protected static boolean COMMA(char c)
The COMMA terminal.

COMMA = "," / ";"

Parameters:
c - - the character to be tested
Returns:
true, if c is a comma as defined in RFC2253; false otherwise

PLUS

protected static boolean PLUS(char c)
The PLUS terminal.

PLUS = "+"

Parameters:
c - - the character to be tested
Returns:
true, if c is a PLUS as defined in RFC2253; false otherwise

ASSIGNMENT

protected static boolean ASSIGNMENT(char c)
The ASSIGNMENT terminal.

ASSIGNMENT = "="

Parameters:
c - - the character to be tested
Returns:
true, if c is a assignment as defined in RFC2253; false otherwise

SLASH

protected static boolean SLASH(char c)
The SLASH terminal.

SLASH = "/"

Parameters:
c - - the character to be tested
Returns:
true, if c is a slash as defined in RFC2253; false otherwise

toHexString

public static java.lang.String toHexString(byte[] b)
This routine converts a given byte array into a hexstring, prepended with a HASH_CHAR. The array can be empty, but not null; in the latter case an IllegalArgumentException is thrown, whilst in the former case an empty string is returned (see hexstring syntax spec).

Parameters:
b - - the byte array to be converted into a hexstring
Returns:
a hexstring with the leading "#", or an empty string, if b has zero length

escapeString

public static java.lang.String escapeString(java.lang.String s,
                                            boolean escape)
This routine gets a Unicode String on input, and converts any character, that is outside latin alphabet and numbers, to hexpair, and escapes all special characters.

Parameters:
s - - the string to convert
Returns:
the string where all characters outside latin alphabet have been escaped, including spaces and other special characters

escapeString

public static java.lang.String escapeString(java.lang.String s)
Same as escapeString( s, true );