Skip to content
April 29, 2011 / oop123

Removing Duplicates From An Array In Java – LinkedHashSet

To remove duplicates from an array, you can use the Java’s collection class LinkedHashSet which doesn’t allow duplicate elements to be in it. Obviously, if you don’t want duplicates in an array in the first place, you may as well use LinkedHashSet to store the elements in the first of place instead of an array, but it’s still good to know how to remove duplicates just in case the situation comes up. You could also use HashSet to replace the LinkedHashSet, and it would be *just* slightly tiny bit faster. However, since LinkedHashSet would retain the orders of the elements in the array, it would make a better general purpose removing-duplicates function.

The following code is a function that removes duplicates from a int array using LinkedHashSet.

public static int[] removeDuplicatesWithSets(int[] array) {
        if (array == null || array.length == 0) { return array; }

        //create a linked hash set that would remove all the duplicates of
        //the array while retaining order
        Set<Integer> set = new LinkedHashSet<Integer>();
        for (int i = 0, len = array.length; i < len; i++) {
            set.add(array[i]);
        }

        int[] noDups = new int[set.size()];
        int count = -1;
        for (Integer num : set) {
            noDups[++count] = num;
        }
        return noDups;
}

Unfortunately, Java’s collections can only work with objects, so there will be a lot of autoboxing; also, all the ints need to be converted to Integers, so it also uses a fair amount of memory.

A function for remove duplicate Objects using generics would look something like this (I’m new to Java, so this may suck/not work properly):

@SuppressWarnings({"unchecked"})
public static  T[] removeDuplicates(T[] array) {
        if (array == null || array.length == 0) return array;

        //create a linked hash set - remove all the duplicates of array while retaining order
        LinkedHashSet<T> set = new LinkedHashSet<T>();
        for (int i = 0; i < array.length; ++i) {
            if (array[i] != null) {
                set.add(array[i]); 
            }
        }

        //create an array of T (using reflection)
        T[] noDup = (T[]) Array.newInstance(array.getClass().getComponentType(), set.size());
        int count = 0;
        for (T element : set) {
            noDup[count++] = element;
        }
        return noDup;
}

Note: If you are checking the elements inside the array for actual equality and not just equal identity, you need to override both the .hashCode() and .equals() of the elements inside the array. This is because LinkedHashSet goes through these steps to determine whether two objects are equal:

  1. 1. Checked whether the two object’s hashCode using .hashCode() are the same or not.
    • If they aren’t, they are not equal and the object is added.
    • If they are, go to step 2.
  2. Check the two elements with .equals().
    • If it’s true, they are equal and the object is not added.
    • Else, the object is added.

In fact, you should also override hashCode() when you override .equals() because the Java Language Specification states that objects that are equal base on .equals() should return the same hash code when .hashCode() are called.

Advertisements

2 Comments

Leave a Comment
  1. xunjieli / Aug 5 2011 1:19 pm

    Simple and cool. Solved my problem. Thanks:)

Trackbacks

  1. Removing Duplicates From an Array in Java – Sort and Remove vs LinkedHashSet « oop123

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: