How Portal Search handles special characters when indexing
Portal Search indexes words that are composed of consecutive literals, that is letters, digits and special characters. This section describes how Portal Search handles special characters during indexing.
This includes the following characters:
- The hash or pound sign (
**\#**
). - The percent sign (
**%**
). - The plus sign (
**+**
). - The asterisk (
**\***
).
During indexing special characters are handled as follows:
-
Blank or white space; this includes the tab
Blanks separate words and are not indexed. Example: The string
key board
is indexed as two separate wordskey
andboard
. -
Line break or new line
Line breaks separate words and are not indexed unless they are preceded by a dash (
-
). Examples:-
The string
key board
is indexed as two separate words
key
andboard
. -
The string
key- board
is indexed as one word
keyboard
.
-
-
Dot or sentence end period (
.
) and comma (,
)Dots and commas separate words and are not indexed, unless they are both preceded and followed by a letter or digit. Example: The string
www.ibm.com
is indexed aswww.ibm.com
and not as three separate words. -
Question mark (
?
) and exclamation mark (!
)Question marks and exclamation marks separate words and are not indexed unless they are followed by a letter.
-
Other punctuation:
( ) { } [ ] < > ; : / \ | " _ -
These characters separate words and are not indexed.
-
Other characters
All other characters are removed from the strings in which they appear but do not separate words.
Note
- All characters that split words are discarded during indexing and searching.
- The previous statements apply to indexing. However, in a search query all characters that can be part of the search syntax are treated in that capacity and not as part of the search query. These are the plus (
+
) and minus (-
) signs, double quotation marks ("
), and the asterisk wild card character (*
). If users want to include such characters in their search query, they must enclose them in double quotation marks. For example"+hello"
searches for the string+hello
;"*Hello*"
searches for the string*Hello*
. - The less than ( < ) and greater than ( > ) symbols are special HTML characters that Search cannot handle.