Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement timing metrics #70

Merged
merged 6 commits into from
May 28, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 59 additions & 1 deletion searcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import os
import subprocess
import sys
import re
import time


class Searcher:
Expand Down Expand Up @@ -45,13 +47,69 @@ def search(self, client, output_path_guest, topic_path_host, topic_path_guest, g
"top_k": self.config.top_k
}

# Duplicate first query
single_query_file = ''
with open(os.path.join(topic_path_host, os.path.basename(self.config.topic)), 'r') as file:
queries = file.read()
query_end = queries.find('</top>')
if query_end == -1:
sys.exit('Query format unknown...')
single_query = queries[:query_end+6]
single_query_file = os.path.splitext(os.path.basename(self.config.topic))[0] + '.single.txt'
out_file = open(os.path.join(topic_path_host, single_query_file), 'w')
out_file.write(single_query)
out_file.close()

# Time empty search
search_args['topic']['path'] = os.path.join(topic_path_guest, single_query_file)
container = client.containers.run("{}:{}".format(self.config.repo, save_tag),
command="bash -c 'time /search --json {}'".format(json.dumps(json.dumps(search_args))), volumes=volumes, detach=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some images (such as alpine based ones) don't have bash by default - can this be changed to sh (which is often linked to bash)? Seems to work fine with sh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the images which are based on Debian or derivative distributions have bash (which contains a builtin time) installed but no system wide time command. The sh also tends to be linked to dash in these images (which doesn't contain the builtin). I'm working on a solution now which will use either time or the builtin in bash depending on what is available. I'll update the pull request when that's done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, gotcha - once you've added that we'll merge this PR.

load_times = []
for line in container.logs(stream=True):
match = re.match('^(real|user|sys)\t*(.*)m(.*)s$', line.decode('utf-8'))
if match:
load_times.append(match)

# Time actual search
search_args['topic']['path'] = os.path.join(topic_path_guest, os.path.basename(self.config.topic))
print("Starting container from saved image...")
container = client.containers.run("{}:{}".format(self.config.repo, save_tag),
command="sh -c '/search --json {}'".format(json.dumps(json.dumps(search_args))), volumes=volumes, detach=True)
command="bash -c 'time /search --json {}'".format(json.dumps(json.dumps(search_args))), volumes=volumes, detach=True)

search_times = []
print("Logs for search in container with ID {}...".format(container.id))
for line in container.logs(stream=True):
match = re.match('^(real|user|sys)\t*(.*)m(.*)s$', line.decode('utf-8'))
if match:
search_times.append(match)
print(str(line.decode('utf-8')), end="")
print()

print('**********')
print('Index load timing information')
print(load_times[0].group(0))
print(load_times[1].group(0))
print(load_times[2].group(0))
print()

print('**********')
print('Search timing information')
print(search_times[0].group(0))
print(search_times[1].group(0))
print(search_times[2].group(0))
print()

result_minutes = []
result_seconds = []
for i in range(len(load_times)):
result_minutes.append(int(search_times[i].group(2)) - int(load_times[i].group(2)))
result_seconds.append(float(search_times[i].group(3)) - float(load_times[i].group(3)))
print('**********')
print('Search timing less load')
print('real\t{}m{:.3f}s'.format(result_minutes[0], result_seconds[0]))
print('user\t{}m{:.3f}s'.format(result_minutes[1], result_seconds[1]))
print('sys\t{}m{:.3f}s'.format(result_minutes[2], result_seconds[2]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit can a space be added so the formatting is consistent between the output of time? Might be useful for parsing the output later if needed.

**********
Search timing information
real    0m 13.72s
user    0m 17.58s
sys     0m 0.43s

**********
Search timing less load
real    0m12.680s
user    0m15.300s
sys     0m0.330s

print()

print("Evaluating results using trec_eval...")
for file in os.listdir(self.config.output):
Expand Down