P.Mean: Computing a difference between the first and last variables (created 2008-10-20).

Hello, I seen that you have provided some answers to people's SPSS questions, and I was hoping you could help me. I have a basic question that I imagine this can be done quite easily through syntax, but I don't know how to do it. I have a data set with multiple rows, where each row is data for an individual (pretty basic). I have about 50 variables which are time points of data. I guess you could think of it set up as individuals' scores on some measure at various time points where the columns go, var1day1, var2day1, var3day1, var4day1, var2day1, var2day2, var2day3, var2day4, var3day1, var3day2, var3day3, var3day4.......var1day50, var2day50, var3day50, var4day50 What I need to do is quite simple, I'm creating a variable which is simply the score of the first time point of var1 (var1day1) minus the score of the last time point of that same variable (var1dayX). However, it gets complicated because for some cases the last time point is the variable day50, for some individuals the last time point is variable day3 and there's everything in between. If all individuals had the same time points (which ran up through day 50), I would simple say COMPUTE NEWVAR=VAR1DAY1-VAR1DAY50. But that obviously won't work because not every case has a day50. I also can't tell it to subtract the highest score, because it's not always true that an individuals last score was their highest score.

In my new career as an independent statistical consultant, I won't be using SPSS as much anymore. It costs a lot more than R, and ease of use is a non-issue when I'm running all the data analyses. Still, I'll try to answer the simpler SPSS questions. It turns out that the algorithm works well in most statistical software pacakges.

You can do this in three steps.

1. You need to restructure your data so that var1day1, var1day2, ..., var1day50 are all in a single column. If there were 30 rows in the original data set corresponding to 30 subjects, there would now be 30*50=1,500 rows in the new data set.
2. You need to sort by subject and by day and toss out any missing values.
3. You aggregate the data across subject, using the first() function and last() function. The difference between these two aggregated variables is what you are after.