English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

I need to read in a list of words (don't know which words or how many so it needs to be dynamic) and calculate the frequency of each. I was thinking of either doing a balanced BST of records or a hash table of records. Each record storing the word and the number of occurrences. Also a global variable storing the total number of words to calculate the frequency with.
Any better suggestions than mine. I am thinking the hash table would be best but I have never implemented one before.

2007-02-07 03:52:41 · 5 answers · asked by Kirstin 2 in Computers & Internet Programming & Design

Is a dynamic array the same as a linked list? If so would a binary search tree not be more efficient?

2007-02-07 04:05:58 · update #1

By calculate the frequency I mean calculate the percentage of each. So I need to store the frequency of each word and the total number of words.

2007-02-07 04:08:10 · update #2

5 answers

Looks like you're going to need the following:

1) Dynamic memory allocation for each new word read in from the file
2) Dynamic arrays whereby each new word is added to the array along with a counter
3) use the hash or sorting algorithm to sort the array
4) use a search algorithm to locate the word in the arrray

Of course, the volume of information may dictate which method above for #3 and #4 you will use; extremely large amounts of data may dictate a hash method as opposed to a sor/search method.

process as follows:

1) read a word from the file
2) If not in the array, add it and initialize the counter
3) if in the array, increment the counter

to avoid excessive memory allocation calls, you would want to implement memory blocks so that you're allocating chunks at a time rather than once for each word. Again though, this depends on the number of words being processed; larger amounts and more frequent processing would dictate the block allocation approach.

Hope this helps...

2007-02-07 04:01:48 · answer #1 · answered by BigRez 6 · 2 0

Both dynamic array and linked list will work. I would use linked list to calculate frequency faster.

Both hash table and BST are well. But for this case, I will sugest to use BST. You need extra job using Hash table to compare the word "eat", "tea" and "ate", that will slow down a bit.

 

2007-02-07 05:15:04 · answer #2 · answered by oohay_member_directory 4 · 0 1

A Hash table will give you the most flexibility if the language you're using supports the data type well.

2007-02-07 03:56:30 · answer #3 · answered by Anonymous · 1 0

Hashtable, all the way. I would hash each string, and if it exists in the hashtable, increment the value. If it doesn't exist, add it and set the value to 1. When you're done you'll have a collection of name / value pairs containing (string, count).

2007-02-07 06:14:33 · answer #4 · answered by Pfo 7 · 0 0

I'd use a good, old-fashioned array.

2007-02-07 04:01:50 · answer #5 · answered by Anonymous · 0 2

fedest.com, questions and answers