I need to write a simple scanner for ASCII files. The scanner should recognize (find) three kinds of tokens in the input file: strings of spaces, single new lines, and words (a word is defined as any sequence of characters not including spaces or new lines).
Everything in the input file can be broken down into sequences/tokens of one of these three types.
I'm need to write a driver (main() function) that repeatedly calls a function called getToken(). getToken reads the next token in the input file, and returns to main() the token type, and the actual string (lexeme) of the token.
For each call to getToken() (for each token returned), main outputs the token type (a unique integer is OK to represent the type), and the string (except for new lines – this causes problems). main also outputs the position in the input file for each token: the line number and the char position on the line of the beginning of the token
The logic overwhelms me and I can't think it through. Help!
2006-10-03
10:28:24
·
5 answers
·
asked by
Anonymous
in
Computers & Internet
➔ Programming & Design
An example of the input file's contents is:
Hello 123
second line
hello to the world!
If you need to pass the input file stream name to getToken, you must use a reference parameter.
You will need to use a concept called look ahead, because for space strings or words you will not know where the end of the token is until you read the first character past its last character. So this look ahead character must be saved and used as the first input character the next time that getToken() is called. You can store it in a static variable, but this gets to be kind of twisted to find and re-use it.... the easier method is to push the last character back onto the input file stream so it is like you never read it. Use:
ifstreamName.unget();
An example of the program's output is:
Please enter the input filename:
tokentype type=3 str=Hello on line #1 at charpos=1
tokentype=2 str= on line #1 at charpos=6
token=3 str=123 on line #1 at charpos=7
2006-10-03
10:32:35 ·
update #1
NOTE: typos fixed and details added 2:53pm PT!
First, read in the file, and store it in a buffer. Use a pointer to keep track of where you are in the buffer. Initially, it points to the first character.
For your getToken(), use a while loop. Inside the loop, use a switch statement. The cases are used to implement a state transition diagram that describes how to recognize the tokens.
How to design the state transition diagram:
Begin with an initial state. Then add transitions based on the different characters you expect to see. For example, in your case, you'd have one transition for a space, one for a newline, and one for anything else.
Each transition leads to another state, which in turn has transitions based on what character comes next. The state may be a "final state", which corresponds to recognizing a complete token. Then, you return to the initial state and start again.
For example, from your start state, your transition corresponding to spaces goes to another state. From there, if you read another space, stay in the same state (loops back to the same state are allowed). If you read anything else, then transition to a final state, retract your input pointer by one character (because it belongs to the next token), and you've recognized a token corresponding to a sequence of spaces.
The transition from the start state on a newline is to a final state, because the newline token consists of a single character. In this case, no retract (or unget) is required.
The transition from the start state on any other character goes to yet another state, in which you remain as long as you read characters other than spaces or newlines. Once you do, you transition to a final state, recognize the "word" token, retract the input pointer by one character, and you're ready to go back to the start state to recognize the next token.
You should use at least two pointers to the input: one that points to the start of the lexeme, and one that points to the current character, so that when you finish recognizing a token, you know exactly which characters belong to it.
Once you've built your state transition diagram, you can implement it using your while loop and switch statement in getToken(). The cases correspond to states (which should be numbered). Your program begins in the initial state, and returns to it every time it recognizes a token, in order to begin recognizing the next one.
NOTE: it sounds like you're taking a compiler class. I've taught these before, so if you need more help, email me and let me know.
2006-10-03 10:43:22
·
answer #1
·
answered by James L 5
·
0⤊
0⤋
Well, I'm not going to do your homework for you, but I'll just give some hints & tips.
For any program idea, you should write pseudo-code & break the problem down into seperate components. When you write the pseudo-code, you don't need to worry about variables or complex loops. I suggest writing down the pseudo-code for finding only 1 token type from 1 line in the file first.
After that, you can get an idea of the functions that you are going to need. Your teacher won't tell you to write a program which uses a function he/she hasn't taught you yet. Make sure that you completely understand how all your functions work before going on.
Forget about the seperate GetToken() procedure for now - you can put the code into GetToken() later. Write down the code to find one token and print it. If you are having trouble with the algorithm, break the code down even further.
Algorithm broken down even further:
Make a test program where you search through a test string (don't bother with loading a file) for one type of token. Obviously, you use a loop, and you need to copy the token characters to an array in order to print the characters - remember to add '\n\ and zero at the end of the array. The copying code obviously needs a loop too, because you have to copy multiple characters to the array. If you can do that, then expanding the program to what you need is no problem. Make sure you get down the code for finding ONE token first, then add the code for finding the other tokens later.
Here is the basic algorithm in pseudo-code for finding 1 token (ASCII number) in a string:
(I hope you know or have a table for ASCII characters.)
1) get character
2) is it a zero (end of string)? if so, goto end sequence
3) is the number below ASCII value of '0'? if so, goto 1
4) is the number above ASCII value of '9'? if so, goto 1
5) copy that ASCII value and next ASCII values to array until we get to a space character (aw, you should know how to make a loop for that)
6) goto 1
end sequence:
1) add '\n' and zero to end of array
2) print array
You should be able to write a program from that pseudo-code. Get that done first, then add the code for the other tokens and add the code for reporting the token as your program requires.
Get that done, then create the GetToken() procedure and work in the C++ file stream function calls that your teacher taught you.
2006-10-03 11:22:31
·
answer #2
·
answered by Balk 6
·
0⤊
0⤋
Because you're formatting as a double instead of an integer seems the most logical answer. It's been a long time for me and I never liked C but I had Turbo C once.
2016-03-27 03:42:43
·
answer #3
·
answered by Anonymous
·
0⤊
0⤋
I think the description is pretty much gives steps you need to perform.
1. Open the file
2. read line by line (i believe there is readline function)
3. explode into array
4. do string comparisons.
Sounds pretty simple to me, unless i am missing something.
2006-10-03 10:34:25
·
answer #4
·
answered by Sanjay 3
·
0⤊
0⤋
I would help if I could, but I have NO idea what you're talking about. Good Luck!
2006-10-03 10:35:50
·
answer #5
·
answered by linds 2
·
0⤊
0⤋