On big buffers
2015-03-25 20:18:22.019561+00 by
Dan Lyke
8 comments
Edit: Yeah, I'm skeptical.
When In-Memory Computing is Slower than Heavy Disk Usage, Kamran Karimi, Diwakar Krishnamurthy, Parissa Mirjafari.
A quick skim suggests that this is about letting your OS do the buffering rather than blowing out your CPU cache while attempting to buffering, and that incremental asynch writes while you're doing other stuff beats serializing everything.
Via /., but then reading the comments from people who've read the paper more thoroughly, suggests that these folks are exploiting bad behavior from Python and/or Java, and really don't understand what's happening under the hood.
comments in ascending chronological order (reverse):
#Comment Re: On big buffers made: 2015-03-25 21:17:27.525827+00 by:
ebradway
[edit history]
I haven't tested the Java version, but the Python code in the article prepends the one-byte string to the longer string. If you change the code to the appropriate append, the times are exactly as expected. And I sure don't want to wait around while I try to insert a byte at the start of a file a million times.
Here's my test code:
# Modified code from Kamran Karimi (kkarimi@ucalgary.ca)
# Modifications by Eric Wolf
# Short version. Runs each experiment once
import timeit
numAdd = 1 #Additional changes are needed when num
Add = 1000000
totalMemory = 1000000 # bytes
addString = ""
for i in range(0, numAdd):
addString = addString + "1"
# First part: in-memory
numIter = int(totalMemory / len(addString))
# Prepend the "1" to the string
concatString = ""
start = timeit.default_timer()
for i in range(0, numIter):
concatString = addString + concatString
stop = timeit.default_timer()
prependTime = stop - start
# Append the "1" to the string
concatString = ""
start = timeit.default_timer()
for i in range(0, numIter):
concatString = concatString + addString
stop = timeit.default_timer()
appendTime = stop - start
print "Prepend: " + str(prependTime)
print "Append: " + str(appendTime)
Results:
~$ python karimi.py
Prepend: 54.0762400627
Append: 0.134270906448
#Comment Re: On big buffers made: 2015-03-25 21:19:52.966956+00 by:
ebradway
Not certain what's going on with the comment formatter...
#Comment Re: On big buffers made: 2015-03-25 21:32:43.850235+00 by:
ebradway
As for the /. comments about Python and Java... That's why I don't read /. any more.
#Comment Re: On big buffers made: 2015-03-26 05:59:46.394013+00 by:
Jack William Bell
I like big buffers, and I cannot lie!
#Comment Re: On big buffers made: 2015-03-26 06:07:43.536317+00 by:
Jack William Bell
[edit history]
On a more serious note, if someone gave me those test requirements, and a check, I would implement it as
a stream with a smart buffer that provided a run length encoding scheme and wrote out the RLE data set
instead of the data added to the buffer. This would handle their primary use case in less than a few
hundred bytes of output and still provide reasonable results if they changed the character they were
prepending once in a while.
Always good to try and anticipate future requirements if it doesn't add extra work to the current
implementation.
OK. Not so serious a note...
#Comment Re: On big buffers made: 2015-03-26 14:20:28.467706+00 by:
Dan Lyke
Flutterby formatter: I didn't have a good way to tell when code started and stopped, so if I see code (ie: /^#!/) I keep going as a <pre> block 'til the next double space.
/. comments: I was shocked by how not bad they were here.
The big issue is that their string manipulation is doing all sorts of wrong, the operating system isn't, and this is a good reminder in higher level languages to avoid string +=
in favor of string builders, like stringstream
.
#Comment Re: On big buffers made: 2015-03-26 14:44:53.049269+00 by:
ebradway
Formatter: I tried using tags and that was worse.
/.: Except for the gratuitous Python/Java bashing, they were insightful.
#Comment Re: On big buffers made: 2015-03-26 15:56:21.316207+00 by:
Dan Lyke
Yeah, see, I'm no longer sure that Python/Java bashing is gratuitous...