Upgrade
We did an upgrade to the line speed between two data centres
The speed stayed the same as it was rate limited from the server
The first thing we'd like to do after checking it is working ok is to see what difference this has made to the speed with no rate limiting
However, the logging for the replication application isn't great
The log is showing PUT start and end and time stamps and that's about all. There is no "bytes" or "bytes transferred" to see if the bytes per seconds has increased
So how to show if the speed has improved if there is no rate information?
The Secret
My answer is to look at the distribution of the transfer times and see how it compares before and after. So I did a script to show a histogram of the transfer
times before and after. If they've shifted, something has changed - assuming that the number of replication requests is roughly the same
Here's the script
#!/usr/bin/python
import json
import time
ourdataurl="storage.teapotic.com"
act={}
dist={}
f = open("/var/log/replication.log","r")
p = 0
t = time.time() + 600 # 600 seconds is 10 minutes
while (time.time() < t):
# tail the file
f.seek(p)
latest_data = f.read()
p = f.tell()
if latest_data:
try:
x = json.loads(latest_data) # our log lines are JSON structured
except:
continue
msg = x['message']
# find BEGIN timestamps and store by tx-id in a dict
if msg.startswith( "BEGIN object PUT") and msg.index(ourdataurl) != -1 :
act[x['tx-id']]=int(x['timestamp'])
# find END timestamples and use the previous BEGIN to calc time
if msg.startswith( "END object PUT") and msg.index(ourdataurl) != -1 :
try:
start=act.pop(x['tx-id'])
except KeyError:
continue
dur = (int(x['timestamp'])-start)
print dur
# now we have the dur, store in a bucket for histogram
b = 0
if dur < 50:
b = 50
elif dur > 1999:
b = 2000
else:
b = int(dur/50) * 50
dist[b]=dist.get(b,0)+1
# display the raw data and then the histogram
print dist
for r in xrange(0,2000,50):
print "%4d %s" % ( r, dist.get(r,0) * "*")
Here's the before histogram
0
50
100
150
200
250
300
350 *
400 *
450 ***
500 **
550 *****
600
650 *
700 *
750 ***
800 *
850
900
950 *
1000
1050 *
1100 **
And here's after
0
50
100
150 ******
200 *******************************************************************************************
250 ***********************************************************************************************************************************************
300 ********************************************************************
350 *************************************
400 **************
450 ***
500 *
550 *
600 **
650
700 *
750
800
850
900
950
1000
1050
1100
As you can see there are two interesting things
First, the before has less data
Second, the peak of the "bump" of the traffic time distribution is at 550 seconds before and 250 seconds afterwards.
The same method can be used on any data where a difference needs to be shown but the quantities are missing
So I think we have a winner
What I'd do differently next time
What I'd do different next time? Use python pandas library. pandas can deal with the bucketing and also makes nicer graphs
Top comments (0)