157. Read N Characters Given Read4
Solution
Overview
Interview Tendencies: Google and Facebook
A long time ago, long ago, so long ago that no one can remember, algorithm interview questions were less popular. Ten years ago big companies mainly filtered the candidates by the university ranks, and the interview questions were like please describe how DDR memory works.
Nowadays there are some tendencies to merge this "old-style interview" with the modern algorithm questions interview. The idea is to ask a question which sounds like algorithmic but checks your knowledge of how do computers work: Round-robin CPU scheduling, C10k problem first solved by nginx, etc.
Is it good or bad? That's a reality to deal with, especially if we speak about Google or Facebook interviews.
Read N Characters Given Read4
Back to the problem, the question is "how does the memory work":
Because of the physical implementation, loading 4 bytes in DDR is faster than to load 1 byte 4 times.
On the majority of computers today, collection of 4 bytes, or 32 bits, is called a word. Most modern CPUs are optimized for the operations with words.
To sum up, the problem is a practical low-level question. The standard approach (Approach 1) to solve it using the internal buffer of 4 characters:
File -> Internal Buffer of 4 Characters -> Buffer of N Characters.
Figure 1. Approach 1: solution with internal buffer.
Once it's done, and you show your understanding of memory operations, the follow-up question is how to speed up. The answer (Approach 2) is quite straightforward. If it's possible, do not use the internal buffer of 4 characters to avoid the double copy:
File -> Buffer of N Characters.
Figure 2. Approach 2: solution without internal buffer.
Approach 1: Use Internal Buffer of 4 Characters
Figure 3. Solution with internal buffer.
Algorithm
Let's use an internal buffer of 4 characters to solve this problem:
File -> Internal Buffer of 4 Characters -> Buffer of N Characters.
Initialize the number of copied characters
copiedChars = 0
, and the number of read characters:readChars = 4
. It's convenient initializereadChars
to4
and then usereadChars != 4
as EOF marker.Initialize an internal buffer of 4 characters:
buf4
.While number of copied characters is less than N:
copiedChars < n
and there are still characters in the file:readChars == 4
:Read from file into internal buffer:
readChars = read4(buf4)
.Copy the characters from internal buffer
buf4
into main bufferbuf
one by one. IncreasecopiedChars
after each character. In the number of copied characters is equal to N:copiedChars == n
, interrupt the copy process and returncopiedChars
.
Implementation
Complexity Analysis
Time complexity: to copy N characters.
Space complexity: to keep
buf4
of 4 elements.
Approach 2: Speed Up: No Internal Buffer
Figure 4. Solution without internal buffer.
This solution is mainly suitable for the languages (C, C++, Golang) where pointers allow to append directly to the primary buffer buf
.
Algorithm
Initialize the number of copied characters
copiedChars = 0
, and the number of read characters:readChars = 4
.While number of copied characters is less than N:
copiedChars < n
and there are still characters in the file:readChars == 4
:Read from file directly into buffer:
read4(buf + copiedChars)
.Increase
copiedChars
:copiedChars += readChars
.
Now
buf
contains at least N characters. Returnmin(n, copiedChars)
.
Implementation
Complexity Analysis
Time complexity: to copy N characters.
Space complexity: .
Took some time to even to completely get the question :(
the Approach 2 might overflow the buffer since the buf is only guaranteed to fit n characters. :|
For example, file is "abcdefgh" and you want to read 5 chars, than buf will overflow with 3 additional bytes.
Old school question. The rational stumped me at first, but it's a good question.
I saw that question in Facebook interview, I couldn't solve that time. Here my two cents (Java)
Thanks for the explanation.
// Global variable int ans = 0; int ans4 = 0; int idx4 = 0; boolean eof = false; char[] buf4 = new char[4]; public int read(char[] buf, int n) { ans = 0; // Reset to reuse read() while(ans < n){ while(ans < n && idx4 < ans4) buf[ans++] = buf4[idx4++]; if(ans == n || eof) return ans; res4 = read4(buf4); // Read other chunk idx4 = 0; // Reset to reuse if(res4 < 4) eof = true; } return ans; }
Am I wrong in understanding this question ? At first question looked tough but it turned out to be very simple one. I didn't get the rationale behind this simple stupid question!!!
FYI, word is not 4 bytes, word is 2 bytes, dword is 4 bytes.
Please correct explanation
Good intro to solution. But definitely 13 years ago at least companies didn't select candidates based on DDR questions. But yes questions used to be much simpler.
The example solutions does not consider consecutive read-by-N, unfortunately.
Consider this case: file = "abcdefg", n = 5. If we call the read() twice, what should the second read() get? If it is real world problem, we would expect the second read() to get 'fg', but the solutions here would get empty string "".
The first read() will call read4 twice, and get 'abcd', 'efg', then the solution returned 'abcde' 5 bytes, and discard 'fg'. The second read() calls read4, and the read4 maintained file pointer is already pointing to the end of file, then read4 returns empty. A complete solution should buffer the extra bytes internally, and prefix the left-over to the next read().
Approach 2 with protection against overflowing primary buffer - allocate an internal buffer for the last read if it might overflow.
class Solution { public: int read(char *buf, int n) { int copiedChars = 0, readChars = 4; while (copiedChars < n && readChars == 4) { if (n - copiedChars < 4) { // internal buffer for final read that may overflow buf char buf4[4]; readChars = read4(buf4); for (int i = 0; i < readChars; ++i) { buf[copiedChars] = buf4[i]; ++copiedChars; } } else { readChars = read4(buf + copiedChars); copiedChars += readChars; } } return min(n, copiedChars); } };
I agree with the approach 2, but definitely not the implementation of approach 2, which takes the risk of overflowing the given buffer.