Count duplicates in a Linked List

Introduction

The linked list is one of the most important concepts and data structures to learn while preparing for interviews. Having a good grasp of Linked Lists can be a huge plus point in a coding interview.

Problem Statement

In this question, we are given a singly linked list. We have to count the number of duplicate nodes in the linked list

Problem Statement Understanding

Let’s first understand the problem statement with help of examples.

Suppose the given list is:

We have to count the number of duplicate nodes in the list.

From the above given linked list, we can see that:

  • Count of each 1 and 8 is 2 in the linked list.
  • While the count of 7 is 1 in the linked list.

So, we can say that duplicates of 1 and 8 exists in the linked list, 1 duplicate each of 1 and 8 exists in the linked list. So, we will return the count of duplicate node in linked list as (1+1) = 2.

If the given linked list is:

For the above linked list, we can see that:

  • Count of each 1, 2, 3 is 2 in the linked list.
  • The count of 5 is 3.
  • While the count of 4 is 1 in the linked list.

So, we can say that duplicates of 1, 2, 3, 5 exists in the linked list, 1 duplicate each of 1, 2 and 3 exists and 2 duplicates of 5 exists. So, we will return the count of duplicate node in linked list as (1+1+1+2) = 5.

Now, I think it is clear from the examples what we have to do in the problem. So let’s move to approach.

Before jumping to approach, just try to think how can you approach this problem?

It's okay if your solution is brute force, we will try to optimize it together.

We will first make use of simple nested list traversal to find the duplicates, although it will be a brute force approach, but it is necessary to build the foundation.

Let us have a glance at brute force approaches.

Approach and Algorithm (Brute Force)

The approach is going to be pretty simple.

  • We will create a variable count and initialize it to 0.
  • Now, we will traverse through the list and for every node, we will traverse from its next node to the end of the list and whenever we will find a match, we will increase the counter and will break out of the loop. Basically here what we are trying to find out is that is this particular node duplicate or not.

This method has a time complexity of O(n2) as it is using nested traversal.

Can we do better?
Yes. We can do better. We can do it in O(n) with the help of hashing.

Let us see the efficient approach.

Approach (Hashing)

In this approach, we will use hashing.

  • Firstly, we will create an unordered set, and we will insert the data of the head in the set.
  • We will also create a counter whose initial value will be 0.
  • Now, we will traverse through the list, starting from the next node, and for every node, we will check if that node is already present in the set or not. If it is present, we will increment the counter. We will also keep inserting the nodes, that we are traversing, in the set.

Algorithm

  • Base Case - If the head is NULL, return 0.
  • Create an unordered set, say s, and a variable count with an initial value of 0.
  • Insert the head → data in the set.
  • Traverse from the next node of the head till the end of the list.
  • In every iteration, check if the node’s data is already present in the set or not. If present, increment the count.
  • Insert the current node’s data in the set.
  • In the end, return the count variable.

Dry Run

Code Implementation

#include 
#include 
using namespace std;

struct Node {
    int data;
    Node* next;
};

void insert(Node** head, int item)
{
    Node* temp = new Node();
    temp->data = item;
    temp->next = *head;
    *head = temp;
}

int countNode(Node* head)
{
    if (head == NULL)
       return 0;;

    unordered_set s;
    s.insert(head->data);

    int count = 0;
    for (Node *curr=head->next; curr != NULL; curr=curr->next) {
        if (s.find(curr->data) != s.end())
             count++;
 
        s.insert(curr->data);
    }

    return count;
}
 

int main()
{
    Node* head = NULL;
    insert(&head, 8);
    insert(&head, 8);
    insert(&head, 1);
    insert(&head, 7);
    insert(&head, 1);
 
    cout << countNode(head);
 
    return 0;
}


import java.util.HashSet;
 
public class PrepBytes
{
 

static class Node
{
    int data;
    Node next;
};
static Node head;

static void insert(Node ref_head, int item)
{
    Node temp = new Node();
    temp.data = item;
    temp.next = ref_head;
    head = temp;
     
}

static int countNode(Node head)
{
    if (head == null)
    return 0;;

    HashSets = new HashSet<>();
    s.add(head.data);

    int count = 0;
    for (Node curr=head.next; curr != null; curr=curr.next)
    {
        if (s.contains(curr.data))
            count++;
 
        s.add(curr.data);
    }

    return count;
}

public static void main(String[] args)
{
 
    insert(head, 8);
    insert(head, 8);
    insert(head, 1);
    insert(head, 7);
    insert(head, 1);
 
    System.out.println(countNode(head));
}
}

Output

2

Time Complexity: O(n), as list traversal is needed.

So, in this article, we have tried to explain the most efficient approach to find duplicates in a given linked list. This is an important question when it comes to coding interviews. If you want to solve more questions on Linked List, which are curated by our expert mentors at PrepBytes, you can follow this link Linked List.

Previous post LinkedList removeFirst method in Java
Next post Rearrange a given Linked List in-place

Leave a Reply

Your email address will not be published. Required fields are marked *