The Easy Way Removing Duplicates in SAS

Here’s a sample macro that shows removing duplicates in SAS. This question asked in Synchrony interview.

Here’s input dataset having duplicate rows

beluga  whale 15
whale   shark 40
basking shark 30
gray    whale 50
mako   shark 12
sperm  whale 60
dwarf  shark .5
whale  shark 40
humpback   . 50
blue   whale 100
killer whale 30

SAS Macro

DATA marine;
 INFILE 'c:\MyRawData\Sealife.dat';
 INPUT Name $ Family $ Length;
* Sort the data;
PROC SORT DATA = marine OUT = seasort NODUPKEY;
 BY Family DESCENDING Length;
PROC PRINT DATA = seasort;
 TITLE 'Whales and Sharks 1';
RUN;

The DATA step reads the raw data from a file called Sealife.dat and creates a SAS data set named MARINE.

Then PROC SORT rearranges the observations by family in ascending order, and by
length in descending order. While the OUT= option writes the sorted data into a new data set named SEASORT.

Spotlight

The NODUPKEY option of PROC SORT eliminates duplicates if any

The output from PROC PRINT looks like this:

Whales and Sharks 1
 Obs    Name           Family    Length
 1      humpback                 50.0
 2      whale          shark     40.0
 3      basking        shark    30.0
 4      mako           shark    12.0
 5      dwarf          shark    0.5
 6      blue           whale    100.0
 7      sperm          whale   60.0
 8      gray           whale   50.0
 9      killer         whale   30.0
 10     beluga         whale   15.0

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.