Here’s a sample macro that shows removing duplicates in SAS. This question asked in Synchrony interview.
Here’s input dataset having duplicate rows
beluga whale 15
whale shark 40
basking shark 30
gray whale 50
mako shark 12
sperm whale 60
dwarf shark .5
whale shark 40
humpback . 50
blue whale 100
killer whale 30
SAS Macro
DATA marine;
INFILE 'c:\MyRawData\Sealife.dat';
INPUT Name $ Family $ Length;
* Sort the data;
PROC SORT DATA = marine OUT = seasort NODUPKEY;
BY Family DESCENDING Length;
PROC PRINT DATA = seasort;
TITLE 'Whales and Sharks 1';
RUN;
The DATA step reads the raw data from a file called Sealife.dat and creates a SAS data set named MARINE.
Then PROC SORT rearranges the observations by family in ascending order, and by
length in descending order. While the OUT= option writes the sorted data into a new data set named SEASORT.
Spotlight
The NODUPKEY option of PROC SORT eliminates duplicates if any
The output from PROC PRINT looks like this:
Whales and Sharks 1
Obs Name Family Length
1 humpback 50.0
2 whale shark 40.0
3 basking shark 30.0
4 mako shark 12.0
5 dwarf shark 0.5
6 blue whale 100.0
7 sperm whale 60.0
8 gray whale 50.0
9 killer whale 30.0
10 beluga whale 15.0
Related