I have collected this question from my friends. The question is how to eliminate duplicates in a data set. This question asked in Synchrony interview.
The data in input data set
beluga whale 15 whale shark 40 basking shark 30 gray whale 50 mako shark 12 sperm whale 60 dwarf shark .5 whale shark 40 humpback . 50 blue whale 100 killer whale 30
DATA marine; INFILE 'c:\MyRawData\Sealife.dat'; INPUT Name $ Family $ Length; * Sort the data; PROC SORT DATA = marine OUT = seasort NODUPKEY; BY Family DESCENDING Length; PROC PRINT DATA = seasort; TITLE 'Whales and Sharks 1'; RUN;
The DATA step reads the raw data from a file called Sealife.dat and creates a SAS data set named MARINE.
Then PROC SORT rearranges the observations by family in ascending order, and by
length in descending order. While the OUT= option writes the sorted data into a new data set named SEASORT.
The NODUPKEY option of PROC SORT eliminates any duplicates
The output from PROC PRINT looks like this:
Whales and Sharks 1 Obs Name Family Length 1 humpback 50.0 2 whale shark 40.0 3 basking shark 30.0 4 mako shark 12.0 5 dwarf shark 0.5 6 blue whale 100.0 7 sperm whale 60.0 8 gray whale 50.0 9 killer whale 30.0 10 beluga whale 15.0