Some of my students in the Introduction to SAS class were having trouble reading in a tab-delimited text file, and it’s not too surprising, because some of the student in the Introduction to R class were having problems with the same file. Here’s some details about the data set, what problems it caused, and a couple of ways that you could fix it.

The data file is a freely available data set on the DASL (Data and Story Library) website,

lib.stat.cmu.edu/DASL/Datafiles/homedat.html

Screenshot of the website lib.stat.cmu.edu/DASL/Datafiles/homedat.html

Screenshot of the website lib.stat.cmu.edu/DASL/Datafiles/homedat.html

Notice how this website mixes the data in with the description of the bad. This is a bad practice, for a variety of reasons, but for a data set that is free, you shouldn’t complain. The easiest way to read this into SAS is to cut-and-paste the text file into a text editor like notepad and save the file in your data subdirectory.

Screenshot of the notepad editor with the data cut-and-pasted from the website

Screenshot of the notepad editor with the data cut-and-pasted from the website

There are a couple of subtle things that you should notice about this text file. First, notice how nicely aligned all the data values are. The second, third, and fourth columns of data, for example, always starts exactly in columns 9, 17, and 25. The evenness and the fact that the column starts are exactly 8 characters apart is a pretty good hint that this is a tab delimited file. It could be a space delimited file, with multiple spaces instead of a single tab, though, so you should investigate further. One additional hint, which I can’t show with a static screenshot is how the cursor will tend to jump as you move left and right. You wouldn’t see that sort of jump with a space delimited file.

A second thing to notice is that there is a blank link between the variable names and the data. This makes the file easier to read on the screen, but is a minor problem for SAS. I’ll leave that blank line in for now.

A third thing to notice, and this is super subtle, is that the variable names do not line up with the data beneath them. This is a hint that the first row is NOT tab delimited, but space delimited. You can see that by scrolling left and right in the very first line of the file. No jumping! This causes all sorts of problems.

So here is the SAS code that I wrote to read in the file.

* housing_import.sas;
* written by Steve Simon;
* February 26, 2018;

libname MEDB5507
  "c:/Users/simons/Dropbox/u/DBHI/classes/2018a_sas/dat";
filename housing
  "c:/Users/simons/Dropbox/u/DBHI/classes/2018a_sas/dat/housing.txt";
proc import
    datafile=housing
    dbms=dlm
  out=MEDB5507.housing_v01 replace;
  delimiter='09'x;
  getnames=yes;
run;
proc print
    data=MEDB5507.housing_v01 (obs=10);
  title "Printout of housing_v01";
run;

When you run the code, you get a strange warning.

Number of names found is less than number of variables found.
Name PRICE  SQFT  AGE FEATS   NE    CUST     COR         TAX truncated to
PRICE__SQFT__AGE_FEATS___NE____C.
Problems were detected with provided names.  See LOG.

There are two warnings actually. The first is that the number of names at the top of the file does not match the number of variables found in the data below. The second warning is that the name of the first variable had to be truncated.

SAS is having problems with this file because it can’t separate the eight names found in the first line of code. Those eight names were separated by one or more blanks, and SAS was expecting (as you told it by the delimiter=’09’x statement) that those names would be separated by tabs.