The evaluation of a solution is done by a program, called evaluator. The evaluator is the most important part of a problem: it prepares the input sent to the solution and judges its output, producing a feedback.
To test a solution, the evaluator has to start a solution process, using the TuringArena library. Then, it can communicate with the process by calling directly the functions implemented by the solution. The evaluator can start a solution process as many times as it wants.
We'll see later how the communication is realized.
In this example problem, the evaluator is written in Python,
in the file
It tests the solution on several pairs of numbers
b (called test cases), checking if the result of
sum(a,b) is actually
It performs a fixed number of iterations, one per test case. In each iteration it does the following.
- It generates a random pair of numbers
- It starts a new solution process, and communicates with it.
- It calls the
sumfunction, passing the values of
bjust generated, and stores the result in a variable
- It calls the
sumbehaves correctly and returns the value
c == a+b, the evaluator considers this test case passed.
- If there is an error in the solution process, say, the solution attempts to open a file, executes a disallowed system call, or takes too long to answer, then the evaluator is notified by TuringArena, and considers the test case failed.
sumreturns a value which is not the sum of
b, the test case is also considered failed.
The evaluator reports the outcome of each test case.
At the end,
the evaluator marks the goal
as achieved if all the test cases are passed.
In general, a problem may have several goals, which are achieved depending on, say, how many functionalities are implemented, the quality of the outputs, and the computational efficiency of the solution.
Try it yourself!
Try to modify the file
evaluator.py. Some suggestions:
- Change the number of test cases.
- Make the numbers
- Make the evaluator stop as soon as a test case is failed.
- Test the solution on negative values.