SAS day 56: ANYALPHA
Once we had an excel sheet with an extreme variety of date formats that don’t make sense. Is there a method to clean the data so they will be meaningful dates?
After observing the data structure, we notice it either starts with a letter or number. Therefore, we will apply the miraculous ANYALPHA function to detect the first string is a letter or a number.
ANYALPHA(string, i) : Search a character string for the first alphabetic character starts at a specific ith position; if no specific position is assigned then it searches from the beginning.
- if the first string is a letter then set the newdate to missing.
if the first string is a number then set the newdate using the input function
Note, we need to minus a constant “-21916” if the date range is absurd.
data sample1; format newdate date9.; set sample; if anyalpha(date)^=1 then do; if anyalpha(scan(Date,2,'-'))=1 then newdate=input(Date,date9.); else if length(Date)<9 then newdate=input(Date,8.)-21916; end; if anyalpha(date)=1 then alpha_flag="Y"; run;
Now we can see the pretty newdate we come up with.
While i got very frustrating to work with the raw data that doesn’t make sense, I have learned so much in these situations. I can view it as a challenge or an opportunity.
Thanks to my colleague Yajnes that always generously sharing all the SAS tricks with me!