Algorithm: The implementation of Trie Tree (C++)

Question:

There is a text file which includes many lines. And each line has only one word or phrase. Now, please implement a program to get the prefix of each word. (Please print each of prefix word behind the original word and separate by a blank space.) For example,

<Word.txt>

chock choc

choice choi

clamp cl

cant ca

cool coo

color col

Solution:

The basic idea of my solution for this issue is to take advantage of Trie Tree.

In this tree, there is a root node which doesn’t have parent node. And each of nodes has 26 child nodes at most. 26 child nodes will be used to save the letter, “a” to “z” respectively.

For inserting each of word, we should pick up each character of word and put it to the proper node. For example, if we want to insert word “chock” to the prefix word. We should pick up the first character ‘c’ and put it to the child nodes of root node. Then, we should pick up the second character ‘h’ and put it to the child nodes of ‘c’ node which is the child node for root node. And next, we will insert the rest characters of word to the tree in the same way.

So, if we insert above words one by one, we should able to get a prefix tree as following.

Algorithm: The implementation of Trie Tree (C++)

Since the prefix of word is a string which could be used to identify a word, we can find that the last character of each prefix word is the last node whose parent has more than 1 child. For example, the prefix word of chock is “choc”. And the end of “choc”, the fourth character, is “c” which has been marked as RED tag. You can see that this node is the last node whose parent “o” has 2 children, “c” and “I”. So, we can find all the end position of prefix for each word.

Design:

After our analysis, we can design 2 classes, Node and PrefixTree.


class Node
{
  public:
    Node* parent ;
    char letter ;
    Node* child[26] ;
    int childNum ;

In class Node, we have 4 members. Member “parent” will direct to the parent node of current node. Member “letter” is used to save the character of the each word. Array “child[26]” is used to save the child nodes for current node. The last member “childNum” will show you the number of children that current node has.


class PrefixTree
{
  private:
    Node* root ;
    char** words ;
  public:
    PrefixTree(char* path) ;
    void getPrefix() ;
    
  
  private:
    char** readWordsFile(char* path) ;
    void insertNode(Node* parent, char* word, int cur) ;
    void createPrefixTree() ;
    
};

In class PrefixTree, we just have 2 variable members. First one is “root” which direct to the root node of prefix tree. And second node is “words”, which is used to store all the words that will be analyzed.

Of course, we also design 4 class methods for loading words, creating prefix tree, inserting node and analyzing prefix word in respectively.

Coding:

Following is all my codes for implement. You can understand them by the comments.


/*

COPYRIGHT ANNOUNCEMENT
Copyright ©2010 Xianyi.Ye
Following codes only could be used to study and research for individual. You are not allowed to use these codes for any commercial purposes.
Sunday, May 09, 2010
*/
#include "stdafx.h"
#include <iostream>
#include <stdio.h> 
#include <stdlib.h> 
#include <malloc.h>
using namespace std ;
class Node
{
  public:
    Node* parent ;
    
    char letter ;
    
    Node* child[26] ;
    
    int childNum ;
  public:
    Node()
    {
      parent = NULL ;
      letter = NULL ;
      childNum = 0 ;
      for(int i=0 ; i<26 ; i++)
      {
        child[i] = NULL ;
      }
    }
};
class PrefixTree
{
  private:
    Node* root ;
    char** words ;
  public:
    PrefixTree(char* path) ;
    void getPrefix() ;
    
  
  private:
    char** readWordsFile(char* path) ;
    void insertNode(Node* parent, char* word, int cur) ;
    void createPrefixTree() ;
    
};
PrefixTree::PrefixTree(char* path)
{
  words = readWordsFile(path) ;
  createPrefixTree() ;
}
void PrefixTree::getPrefix()
{
  int word_Count = _msize(words)/sizeof(char*) ;
  for(int i=0 ; i<word_Count ; i++)  //Travel each word of the line
  {
    //cout<<words[i]<<" " ;  //Output each word
                //We should be able to print the word by this statement.
                //However, since the last character of the word is '/n' or EOF,
                //our words and prefix won't be on the same line if we print the word by "cout" directly.
                //And it's a little complicated to control our output format.
                //Therefore, we will output the word when we travel the tree.
    int word_length = strlen(words[i])-1 ;
    Node* node = root ;
    for(int j=0 ; j<word_length ; j++)  //Travel each letter of the word and
    {                  //And get the node of the last letter of word
      char letter = *((words[i])+j) ;
      cout<<letter ;          //Output the letter of the word.
                      //So, we can get the whole of word when the FOR loop is finished. 
      node = node->child[letter%26] ;
    }
                      //Now, node should direct to the last letter of word 
    cout<<" " ;
                      //Backtracking and find first matched node whose parent has more than 1 child.
                      //Then, this node is the end letter for prefix word.
    node = node->parent ;
    while(node->parent->childNum==1)
    {
      node = node->parent ;
    }
    Node* end = node ;          //Now, "end" is what we wanted.
    //cout<<"end="<<node->letter<<endl ;
    node = root ;
    for(j=0 ; j<word_length ; j++)    //Travel the tree to output each letter of prefix word 
    {                  //until we arrive at the End Node.
      char letter = *((words[i])+j) ;
      node = node->child[letter%26] ;
      cout<<letter ;
      if(node==end)
        break ;
    }
    cout<<endl ;
  }
}
char** PrefixTree::readWordsFile(char* path)
{
  FILE *fp; 
  char buf[256] ;
  if ((fp=fopen(path, "rt"))==NULL) 
  { 
    printf("open file error!!/n"); 
    return NULL; 
  }
  int lineCount = 0;
  while(fgets(buf, 256, fp) != NULL) 
  { 
    lineCount++ ;
  }
  //cout<<"lineCount="<<lineCount<<endl ;
  char** words = new char*[lineCount] ;
  //init words buffer
  for(int i=0 ; i<lineCount ; i++)
  {
    words[i] = new char[256] ;
  }
  fseek(fp,0,SEEK_SET);
  i = 0 ;
  while(fgets(words[i++], 256, fp) != NULL) ;
  return words ;
  
}//end readWordsFile
void PrefixTree::insertNode(Node* parent, char* word, int cur)
{
  int word_length = strlen(word)-1 ;
  if(cur==word_length)
  {
    return ;
  }
  int pos = word[cur]%26 ;
  if(parent->child[pos]==NULL)
  {
    //cout<<"inserting "<<word[cur]<<endl ;
    parent->child[pos] = new Node() ;
    parent->child[pos]->childNum++ ;
    parent->child[pos]->letter = word[cur] ;
    parent->child[pos]->parent = parent ;
    
   }
  else
  {
    parent->child[pos]->childNum++ ;
    //cout<<word[cur]<<" has been inserted."<<endl ;
  }
  
  insertNode(parent->child[pos],word,++cur) ;
}//end insertNode
void PrefixTree::createPrefixTree()
{
  root = new Node() ;
  root->parent = NULL ;
  
  int word_Count = _msize(words)/sizeof(char*) ;
  for(int i=0 ; i<word_Count ; i++)
  {
    //cout<<words[i] ;
    insertNode(root,words[i],0) ;
  }
  
}//end createPrefixTree
int main(int argc, char* argv[])
{
  PrefixTree* tree = new PrefixTree("c://words.txt") ;
  tree->getPrefix() ;
  delete tree ;
  return 0 ;
}

Test case:

<words.txt>

abandon

aboard

absolute

accident

accomplish

list

lively

load

local

objective

oblige

observe

obtain

opposite

optional

ore

organize

ornament

outdoors

outlet

Algorithm: The implementation of Trie Tree (C++)

Any question, please contact me.

[email protected]