Wednesday, July 22, 2009

Episode Four: Not Enough Keys in the Keyboard

The basic character set of the C programming language is a subset of the ASCII character set that includes nine characters which lie outside the ISO 646 invariant character set. Trigraphs were invented as a way of entering source code using keyboards that support any version of the ISO 646 character set.

Trigraph
Replacement
??=
#
??/
\
??'
^
??(
[
??)
]
??!
|
??<
{
??>
}
??-
~
Handling of trigraphs is done at the preprocessing stage. This means they are replaced everywhere, which can became the cause of subtle bugs.

· Within a string literal:
"He said 'Hello???'." becames "He said 'Hello?^.".
· Within a line comment:
// Whats wrong with this??/
void i = 0; // This line is also inside the line comment.

Digraphs were supplied as more readable alternatives to six of the trigraphs.
Digraph
Replacement
<:
[
:>
]
<%
{
%>
}
%:
#
%:%:
##
Unlike trigraphs, digraphs are handled during tokenization, so they pose no harm.

No comments:

Post a Comment