Identifying the right Identifier: CamelCase


Identifiers Rembered

By Chris Douce

This article is an attempt to highlight some of the research papers and books cited during an interesting discussion that begun with the seemingly innocuous question of:

  • Does anyone know of studies investigating the effect of casing on readability of program source code?

This original posting was later expanded, defining 'harder to read':

  • The time taken to recognize whether or not a sequence of letters forms a legal word or not as opposed to the time and effort taken to read a whole sentence in upper case or mixed case.

The word readability was key. Readability can be read in different ways.

On one hand readability can be synonymous with that equally difficult term comprehensibility, or could be viewed as perceptibility, particularly in the cases of programming tools such as text editors and development environment. This raises interesting questions, especially since many of our software tools still continue to use the somewhat impoverished VT100 terminal environment.

The subject of readability (in perceptibility terms) is one that immediately begins to cross boundaries. One of the earliest posts introduced the topic of typography.

Is there something special about reading from the screen as opposed to reading from the page? HCI practitioners have been aware of such differences for considerable time. After all we usually read source directly from the screen before choosing to print, so we can annotate our printouts with notes and arrows on our pages (using our highly viscous pens and pencils).

One of the most appropriate references to be suggested was:

  • Baecker, R.M. and A. Marcus (1990). Human Factors and Typography for More Readable Programs. Reading, MA, Addison-Wesley Publishing Co. (ACM Press)

A related paper is:

  • Baecker, R. (1986). Design Principles for the Enhanced Presentation of Computer Program Source Text. Proceedings of CHI'86 Conference on Human Factors in Computing Systems. M. Mantei and P. Orbeton. Boston, ACM: 51-58

Other interesting papers include:

  • Boyarski, D et. al. (1998) A Study of Fonts Designed for Screen Display Crafting Designs. Proceedings of ACM CHI 98 Conference on Human Factors in Computing Systems. p.87-94
  • Dyson, M. C. and Haselgrove, M (2001) The Influence of Reading Speed and Line Length on the Effectiveness of Reading from Screen. International Journal of Human-Computer Studies v.54 n.4 p.585-612

Frank Wales unearthed an interesting link that explores the origins of a programming convention that we know as CamelCase:

The immediate question being: is there some empirical research out there that tells us definitively that CamelCase is more useful than writingtextlikethis? One hypothesis being that it enhances readability due to the different shape of upper and lower case letters, and of course, allows us to more easily identify word boundaries.

  • A. and Cho, J. (2000). Letter case and text legibility. Supplement to Perception, 29, 45.

During this discussion, Derek Jones provides useful pointers to his commentary on the C 99 specification. Two links are particularly appropriate, providing us with a set of very interesting and appropriate references:

In this section from his forthcoming book Derek describes what is called, 'early vision', the phase of vision performed without apparent effort, and goes on to describe the rules of gestalt perception, edge detection as well as the processes of reading (eye fixation), and some interesting models of reading.

The second link explores the issue of identifiers and their processing in greater depth, providing a wealth of associated information that is again well referenced.

Particularly relevant are the sections on identifier spelling, identifier spelling choices and further explorations relating to human language, memorability, confusability (my favourite) and usability.

Derek also provides us with a reference to research surrounding extreme case alternation (which could descend into altercation if used in anger):

  • Herdman, C. M, Chernecki, D and Dennis, N (1999) Naming {cAsE} {aLtErNaTeD} words, Memory and Cognition, Vol. 27, No. 2, pp 254-266.

There is an interesting distinction that can be made here between intended and unrelated case changes. However, in terms of understanding the perceptual system that programmers constantly use, this research is particularly pertinent.

Programmers use secondary notion (such as indentation), and the the efficacy of particular typefaces to highlight program syntax (or role, or slice) could potentially support programming (and its related activities).

CamelCase, could be perceived to be a form of secondary notation for identifiers, where the case alteration distinguishes between useful identifier elements.

We are reminded of two other useful and related references:

  • Green, T. R. G. (1990). Programming Languages as Information Structures. Psychology of Programming. J.-M. Hoc, T. R. G. Green, R. Samurçay and D. J. Gilmore. London, Academic Press: 118-137.
  • Payne, S. J., M. E. Sime, et al. (1984). Perceptual Structure Cueing in a Simple Command Language. International Journal of Man-Machine Studies 21: 19-29.

An interesting point was raised when somebody asked: 'surely there are more interesting/useful topics to perform research on!'. This is undeniably true. CamelCase, however (and identifier naming) is the stuff of programmer arguments, which indicates that programming style is a topic that will remain in fashion for some considerable time.

Recent comments

No comments available.