Stats: Object oriented features of R (December 19, 2005)

Object oriented features of R (December 19, 2005).

This page is moving to a new website.

If you want to do any serious data analysis in R, you need to learn some of the object oriented features that this program has. The term "object oriented" is difficult to define. The Wikipedia provides the following definition:

In computer science, object-oriented programming, OOP for short, is a computer programming paradigm. The idea behind object-oriented programming is that a computer program is composed of a collection of individual units, or objects, that act on each other, as opposed to a traditional view in which a program is a list of instructions to the computer. Each object is capable of receiving messages, processing data, and sending messages to other objects. Object-oriented programming is claimed to give more flexibility, easing changes to programs, and is widely popular in large scale software engineering. Furthermore, proponents of OOP claim that OOP is easier to learn for those new to computer programming than previous approaches, and that the OOP approach is often simpler to develop and to maintain, lending itself to more direct analysis, coding, and understanding of complex situations and procedures than other programming methods. en.wikipedia.org/wiki/Object-oriented_programming

The Wikipedia then presents six fundamental concepts associated with OOP.

* Class — the unit of definition of data and behavior (functionality) for some kind-of-thing, a class (for example, Dog) is the basis of modularity and structure in an object-oriented computer program. A class should typically be recognizable to a non-programmer familiar with the problem domain, and the code for a class should be coherent and decoupled (as should the code for any good pre-OOP function). With such modularity, the structure of a program will correspond to the aspects of the problem that the program is intended to solve.

* Object — an instance of a class, an object (for example, "Rin Tin Tin" the Dog) is the run-time manifestation of a particular exemplar of a class. Each object has its own data, though the code within a class is shared for economy.

In R, there is an lm class for the output of a linear regression model. For example, the statement:

bivariate.model.1 <- lm(y~x1+x2)

creates an object, bivariate.model.1, of class lm.

* Encapsulation — a type of privacy applied to the data and some of the methods (that is, functions or subroutines) in a class, encapsulation ensures that an object can be changed only through established channels (namely, the class's public methods). Each object exposes an interface — those public methods, which specify how other objects may read or modify it. An interface can prevent, for example, any caller from adding a list of children to a Dog when the Dog is less than one year old.

There is an update function in R that will take an existing lm object and modify the fit by adding or removing terms from the regression model. The coef function extracts model coefficients from an lm object.

* Inheritance — a mechanism for creating subclasses, inheritance provides a way to define a (sub)class as a specialization or subtype or extension of a more general class (as Dog is a subclass of Canidae); a subclass acquires all the data and methods of all of its superclasses, but it can add or change data or methods as the programmer chooses. Inheritance is the "is-a" relationship: a Dog is-a Canidae. This is in contrast to composition, the "has-a" relationship, which user-defined datatypes brought to computer science: a Dog has-a mother (another Dog) and has-a father, etc.

The lm class has many subclasses for more complex regression methods. For example, the glm class is used for generalized linear models and the lme class is used for linear mixed effects models.

* Abstraction — the ability of a program to ignore the details of an object's (sub)class and work at a more generic level when appropriate; For example, "Rin Tin Tin" the Dog may be treated as a Dog much of the time, but when appropriate he is abstracted to the level of Canidae (superclass of Dog) or Carnivora (superclass of Canidae), and so on.

The coef function produces the same type of results whether it is given an lm object or a glm object.

* Polymorphism — polymorphism is behavior that varies depending on the class in which the behavior is invoked. For example, the result of bark() for a Dog would differ from the result of bark() for a Jackal; and in a more sophisticated animal-emulation program, bark() would differ for a Chihuahua and a Saint Bernard.

The predict function produces predicted values for an lm object. For a glm object, it also produces predicted values, but allows you to specify whether you to predict on the original response scale or after the appropriate link function has been applied. The coef function for an lme object (in contrast to lm and glm objects) is more complex because there are estimates at the various levels of the linear mixed effects models (e.g., estimates of coefficents between subjects and coefficients within subjects).

When I get a chance I want to discuss the difference between S3 and S4 objects in R. Here are some references that discuss S3 and S4 objects:

www.stat.auckland.ac.nz/~paul/Talks/Tokyo/recent.pdf

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: R software.