CS 246 Lab 12

Write a program called frequency that reads a text file (one character at a time) and counts the number of occurrences of each word. Define a structure

struct item {
  char *word;
  int count;
}

to store the words with their counts in an array. You should use an array of dynamically allocated structures. You may assume that there are no more than 100000 unique words in the input. You may also assume that no word is longer than 256 characters. Each time you read a word, search the array to see if the word is present. If so, increment the count. If not, insert a new entry into the array. Keep the array in increasing order by word. This will require that you slide the last part of the array up by 1 spot to make room for the new entry, unless the new entry goes at the end. Since your array is always in order, you can search it using binary search. When you are done, sort the array into decreasing order by count and print the words and their frequencies, one per line. Use the functions bsearch and qsort to do the searching and sorting. Convert all letters to lower case when counting. Note: you'll want to copy the word before putting it into the array (strdup).

The program should take any number of file arguments and count the words cumulatively for all of the files. If there are no arguments, it should read from stdin.

The lecture note "Introduction to C Programming" has the word counting program that we looked at in class a while back. You can use this to identify individual words. Just regard a word as a contiguous sequence of letters (isalpha). The "Files in C" note has the cat program that shows you how to open files from command line arguments.