Containers - Handbook - Project utilities - Examples - Weighted Grader

What you will learn?

The guide provided by this document covers the following topics:

example of C++ unit testing framework,
advanced case of configuration file config.json,
scripts for scenario stages,
- usage of project utilities,
- advanced usage of WeightedScoreTestsGrader.

Unit tests with weights using Google Test

This is an example of a hypothetical C++11 application using CMake build automation tool with Google Test as a testing framework. We assume the following:

there is a CMakeLists.txt file in the workspace ($SE_PATH_WORKSPACE) directory,
there is a run scenario launching mentioned CLI application,
there is a test scenario evaluating the application:
- the test of the the application is successful if at least one unit test passes,
- the score is a sum of all passing tests. Each test have different score assigned.

Project layout

We plan to have the following project layout:

├── calculator.h
├── CMakeLists.txt
├── HELLO.md
├── main.cpp
└── test.cpp

calculator.h - header library file calculator.h that contains main logic, that user have to implement,
main.cpp CLI entry point - application to let user use the library,
test.cpp - unit test file with all the unit tests for calculator.h,
CMakeLists.txt - CMake configuration that specifies build targets for main CLI application as well as testing binary that executes unit tests,
HELLO.md - Readme file with instructions for the user.

We want the user to implement correct logic in the calculator.h so that all tests pass.

Specifying project configuration in config.json

We start by changing the configuration of the project:

{
  "version": 2.1,
  "custom": {
    "sphere_engine_example_version": 1.0
  },
  "scenarios": {
    "run": {
      "stages": {
        "build": {
          "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
          "timeout": 120
        },
        "run": {
          "command": "./_build/main"
        }
      },
      "output": {
        "type": "console"
      }
    },
    "test": {
      "tests_weights": [],
      "stages": {
        "build": {
          "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
          "timeout": 120
        },
        "run": {
          "command": "bash -c 'GTEST_OUTPUT=\"xml:$RUNTIME/report.xml\" ./_build/test ; exit 0'"
        },
        "test": {
          "command": "se_utils_cli evaluate UnitTestsEvaluator --grader WeightedScoreTestsGrader --converter XUnitConverter"
        }
      },
      "output": {
        "type": "tests",
        "path": "/sphere-engine/runtime/report.json"
      }
    }
  },
  "files": {
    "open_at_startup": [
      "main.cpp",
      "calculator.h",
      "HELLO.md"
    ]
  }
}

Now let us explain what happens here.

Specification of run scenario:

"run": {
  "stages": {
    "build": {
      "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
      "timeout": 120
    },
    "run": {
      "command": "./_build/main"
    }
  },
  "output": {
    "type": "console"
  }
}

We specify run scenario, by using inline bash script for building. You can create separate file to place all the commands, but our example is simple enough, so we decide to embed entire command directly in the config:

bash -c 'mkdir -p _build && cd _build && cmake .. && make'

This command creates _build directory, executes cmake command that generates all makefiles. After that we build all the targets by using make.

Note that in this case we will have two targets:

main to build main.cpp executable,
test to build test.cpp executable.

You can specify only one target for this scenario as we are interested only in running the CLI application:

bash -c 'mkdir -p _build && cd _build && cmake .. && make main'

However building is fast enough that we decided to build everything at once. It's also more maintainable to have only one command specified both in run and test scenario.

{
  "timeout": 120
}

Here we specify an increased timeout. This is done, because our cmake config in CMakeLists.txt will fetch fixed version of google test, so it may sometimes take a bit longer than usual. By increasing timeout we make sure that even if fetching the dependency will be slow, the build stage will still succeed.

{
  "type": "console"
}

We also specify output as console. This is standard procedure if we are launching CLI applications.

Specification of test scenario:

"test": {
  "tests_weights": [],
  "stages": {
    "build": {
      "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
      "timeout": 120
    },
    "run": {
      "command": "bash -c 'GTEST_OUTPUT=\"xml:$RUNTIME/report.xml\" ./_build/test ; exit 0'"
    },
    "test": {
      "command": "se_utils_cli evaluate UnitTestsEvaluator --grader WeightedScoreTestsGrader --converter XUnitConverter"
    }
  },
  "output": {
    "type": "tests",
    "path": "/sphere-engine/runtime/report.json"
  }
}

In the test scenario, we use tests_weights to specify scores for individual tests defined in test.cpp. That field will be discussed below.

For test stage, the used build command is the same. We also use the same timeout. The interesting part is the run stage of the scenario:

{
  "run": {
    "command": "bash -c 'GTEST_OUTPUT=\"xml:$RUNTIME/report.xml\" ./_build/test ; exit 0'"
  }
}

We run unit test executable, but set output to be in correct location. If we don't specify GTEST_OUTPUT, by default Google Test will print output to stdout. This wouldn't allow se-utils to pick up and format results. Also xml: prefix makes sure that the output is xUnit-compatible XML.

Also we use exit 0, because google test will exit with non-zero code if any test fail. In that case sphere engine would register that test scenario has failed. We want to prevent that behavior. Single failed test doesn't mean that execution of tests failed by itself.

{
  "command": "se_utils_cli evaluate UnitTestsEvaluator --grader WeightedScoreTestsGrader --converter XUnitConverter"
}

In the test stage of test scenario we use builtin utility to load and score the tests based on generated report.xml file. --converter XUnitConverter means that we use xUnit format input. We also use --grader WeightedScoreTestsGrader to change default grader to a weighted one. That will allow us to specify scores for individual test cases.

{
  "type": "tests",
  "path": "/sphere-engine/runtime/report.json"
}

At the end of test scenario we specify the output format to be tests. That setting will display results in the interactive UI.

Other config properties:

{
  "files": {
    "open_at_startup": [
      "main.cpp",
      "calculator.h",
      "HELLO.md"
    ]
  }
}

We specify which files should be opened when workspace is created. We want to show readme content, code of the example CLI application and the crucial file calculator.h where user should implement correct library functionality.

Providing cmake build configuration

We create CMakeLists.txt with the following content:

cmake_minimum_required(VERSION 3.1)
project(CONSOLE_CPP_GOOGLE_TEST)

set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)

Set(FETCHCONTENT_QUIET FALSE)

include(FetchContent)
FetchContent_Declare(
  googletest
  URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
  GIT_PROGRESS TRUE
)

set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(googletest)

link_libraries(gtest_main pthread)

set(CMAKE_CXX_STANDARD 11)

set(CMAKE_INCLUDE_CURRENT_DIR True)

set(CMAKE_CXX_FLAGS "-fprofile-arcs -ftest-coverage")
set(CMAKE_EXE_LINKER_FLAGS "-lgcov")

set(SIMPLE "test")
add_executable(${SIMPLE} "test.cpp")

set(MAIN "main")
add_executable(${MAIN} "main.cpp")

Project meta

cmake_minimum_required(VERSION 3.1)
project(CONSOLE_CPP_GOOGLE_TEST)

The first part of cmake config specifies project version and the cmake version:

Use Google Test as a project dependency

set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
Set(FETCHCONTENT_QUIET FALSE)

include(FetchContent)
FetchContent_Declare(
  googletest
  URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
  GIT_PROGRESS TRUE
)

set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(googletest)

link_libraries(gtest_main pthread)

The second build config part specifies that we want to download external zip file with Google Test release and use it as a library.

set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
Set(FETCHCONTENT_QUIET FALSE)

Makes sure that the fetching will print output debug. For more details what those lines are doing please see CMP0077 warning case and FETCHCONTENT_QUIET variable documentation for FetchContent

include(FetchContent)
FetchContent_Declare(
  googletest
  URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
  GIT_PROGRESS TRUE
)

We use FetchContent to download the Google Test zip.

set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)

Enabling gtest_force_shared_crt option will make gtest link the runtimes dynamically too, and match the project in which it is included. For more details please see official Google Test docs.

FetchContent_MakeAvailable(googletest)

link_libraries(gtest_main pthread)

We make sure that the library is available to cmake and we make sure we link it to the executables.

Specification for build targets

set(CMAKE_CXX_STANDARD 11)

set(CMAKE_INCLUDE_CURRENT_DIR True)

set(CMAKE_CXX_FLAGS "-fprofile-arcs -ftest-coverage")
set(CMAKE_EXE_LINKER_FLAGS "-lgcov")

set(SIMPLE "test")
add_executable(${SIMPLE} "test.cpp")

set(MAIN "main")
add_executable(${MAIN} "main.cpp")

We use C++11 standard and specify linker flags to use by default gcc gcov. We will have to targets:

test,
main.

Boilerplace source code

We provide the following boilerplate code for the user:

main.cpp:

#include <iostream>
#include "calculator.h"
using namespace std;

int main() {
  printf("Welcome to our calculator sample program:\n   add(2, 4) = %d\n\nDone.", add(2, 4));
  return 0;
}

calculator.h:

int add(int a, int b){
  //////////////////////////
  // Write your code here

  // Solution:
  // return a+b;

  return 0;
}

int mul(int a, int b){
  //////////////////////////
  // Write your code here

  // Solution:
  // return a*b;

  return 0;
}

Writing tests

Now we jump to writing our test cases.

test.cpp:

// Include our calculator library
#include "calculator.h"

// Include gtest to define tests
#include <gtest/gtest.h>

// Simple test that assert that adding two zeros produce 0
TEST(SimpleCalculatorTests, test_add_zeros) {
  EXPECT_EQ(add(0, 0), 0) << "Failed to add 0+0";
}

// Simple test that assert that multiplying two zeros produce 0
TEST(SimpleCalculatorTests, test_mul_zeros) {
  EXPECT_EQ(mul(0, 0), 0) << "Failed to mul 0*0";
}

// Some custom test cases for addition
TEST(SimpleCalculatorTests, test_add_operation) {
  EXPECT_EQ(add(1, 2), 3) << "Failed to add 1+2";
  EXPECT_EQ(add(10000, 10000), 20000) << "Failed to add 10000+10000";
  EXPECT_EQ(add(1, -1), 0) << "Failed to add 1+(-1)";
  EXPECT_EQ(add(-1, 1), 0) << "Failed to add (-1)+1";
  EXPECT_EQ(add(-1, -1), -2) << "Failed to add (-1)+(-1)";
}

// Some custom test cases for multiplication
TEST(SimpleCalculatorTests, test_mul_operation) {
  EXPECT_EQ(mul(123456, 1), 123456) << "Failed to mul 123456*1";
  EXPECT_EQ(mul(1, 123456), 123456) << "Failed to mul 1*123456";
  EXPECT_EQ(mul(0, 123456), 0) << "Failed to mul 0*123456";
  EXPECT_EQ(mul(123456, 0), 0) << "Failed to mul 123456*0";
}

// More complex test case that uses both multiplication and addition
TEST(AdvancedCalculatorTests, test_advanced) {
  EXPECT_EQ(add(mul(2, 3), mul(2, 5)), mul(add(3, 5), 2)) << "Failed assertion 2*3+2*5 == 2*(3+5)";
  EXPECT_EQ(add(add(3, -2), -1), 0) << "Failed assertion 3-2-1 == 0";
}

Step 1: Use default weights for all of the tests

Let's alter initial configuration by changing test_weights to an empty array [].

{
  "version": 2.1,
  "custom": {
    "sphere_engine_example_version": 1.0
  },
  "scenarios": {
    "run": {
      "stages": {
        "build": {
          "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
          "timeout": 120
        },
        "run": {
          "command": "./_build/main"
        }
      },
      "output": {
        "type": "console"
      }
    },
    "test": {
      "tests_weights": [],
      "stages": {
        "build": {
          "command": "bash -c 'mkdir -p _build && cd _build && cmake .. && make'",
          "timeout": 120
        },
        "run": {
          "command": "bash -c 'GTEST_OUTPUT=\"xml:$RUNTIME/report.xml\" ./_build/test ; exit 0'"
        },
        "test": {
          "command": "se_utils_cli evaluate UnitTestsEvaluator --grader WeightedScoreTestsGrader --converter XUnitConverter"
        }
      },
      "output": {
        "type": "tests",
        "path": "/sphere-engine/runtime/report.json"
      }
    }
  },
  "files": {
    "open_at_startup": [
      "main.cpp",
      "calculator.h",
      "HELLO.md"
    ]
  }
}

If we run the test scenario, we should see the following output in the console:

[==========] Running 5 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from SimpleCalculatorTests
[ RUN      ] SimpleCalculatorTests.test_add_zeros
[       OK ] SimpleCalculatorTests.test_add_zeros (0 ms)
[ RUN      ] SimpleCalculatorTests.test_mul_zeros
[       OK ] SimpleCalculatorTests.test_mul_zeros (0 ms)
[ RUN      ] SimpleCalculatorTests.test_add_operation
[  FAILED  ] SimpleCalculatorTests.test_add_operation (0 ms)
[ RUN      ] SimpleCalculatorTests.test_mul_operation
[  FAILED  ] SimpleCalculatorTests.test_mul_operation (0 ms)
[----------] 4 tests from SimpleCalculatorTests (1 ms total)
[----------] 1 test from AdvancedCalculatorTests
[ RUN      ] AdvancedCalculatorTests.test_advanced
[       OK ] AdvancedCalculatorTests.test_advanced (0 ms)
[----------] 1 test from AdvancedCalculatorTests (0 ms total)
[----------] Global test environment tear-down
[==========] 5 tests from 2 test suites ran. (1 ms total)
 2 FAILED TESTS

The UI will display the following test statuses:

SimpleCalculatorTests.test_add_zeros OK,
SimpleCalculatorTests.test_mul_zeros OK,
SimpleCalculatorTests.test_add_operation FAILURE,
SimpleCalculatorTests.test_mul_operation FAILURE,
AdvancedCalculatorTests.test_advanced OK.

The total score of the solution would be 3.0:

Test class	Test name	Test result	Assigned score
SimpleCalculatorTests	test_add_zeros	OK	+1
SimpleCalculatorTests	test_mul_zeros	OK	+1
SimpleCalculatorTests	test_add_operation	FAILURE	0
SimpleCalculatorTests	test_mul_operation	FAILURE	0
AdvancedCalculatorTests	test_advanced	OK	+1

This is due to default behavior of WeightedScoreTestsGrader that assigns 1 to passed test case and 0 otherwise.

Step 2: Specify negative scores for failed tests

Now we want to punish the user if the test failed. Instead of using 0 as the score for failed test cases, we want to use negative score of -1:

{
  "tests_weights": [
    {
      "status": "failure",
      "weight": -1
    }
  ]
}

The WeightedScoreTestsGrader works in the following way:

for each test find a first selector that matches that test case and assign a specified weight for that selector. For ease of usage if no selector is found, then assign 1 for a successful test case and 0 otherwise,
sum up all the weights,
if the sum is positive, the final status is OK (FAIL otherwise).

A selector is an object that specifies the following properties:

classname - matches the class name of the test. "*" matches any class name. By default "*" is used if the property is not explicitly specified,
name - matches the name of the test case. "*" matches any name. By default "*" is used if the property is not explicitly specified,
status - matches the status of the test. "*" to match any test case status. This property is handy when you want to assign negative scores to the tests that fail. The status of the test can be failure, ok or error. ("ok" is the default value),
weights - weight for the test case that matches that selector. This is a required field without a default value.

So in our case we assign weight of -1 to all failed tests.

Now if we run the test scenario once again, we would have the same statuses:

SimpleCalculatorTests.test_add_zeros OK,
SimpleCalculatorTests.test_mul_zeros OK,
SimpleCalculatorTests.test_add_operation FAILURE,
SimpleCalculatorTests.test_mul_operation FAILURE,
AdvancedCalculatorTests.test_advanced OK.

But the final score would be: 1.0:

Test class	Test name	Test result	Assigned score
SimpleCalculatorTests	test_add_zeros	OK	+1
SimpleCalculatorTests	test_mul_zeros	OK	+1
SimpleCalculatorTests	test_add_operation	FAILURE	-1
SimpleCalculatorTests	test_mul_operation	FAILURE	-1
AdvancedCalculatorTests	test_advanced	OK	+1

Before achieving final negative score wasn't possible.

Step 3: Specify granular scores for all the cases

We change the tests_weights field to the specified one:

{
  "tests_weights": [
    {
      "name": "test_add_zeros",
      "weight": 1
    },
    {
      "name": "test_mul_zeros",
      "weight": 1
    },
    {
      "name": "test_add_operation",
      "weight": 20
    },
    {
      "name": "test_mul_operation",
      "weight": 20
    },
    {
      "classname": "AdvancedCalculatorTests",
      "weight": 100
    },
    {
      "status": "failure",
      "weight": -1
    }
  ]
}

The above configuration will assign the following scores:

both test_add_zeros and test_mul_zeros will add 1 to the final score if the test pass,
both test_add_operation and test_mul_operation will add 1 to the final score if the test pass,
all tests specified in AdvancedCalculatorTests suite will add 100 to the final score if the test case pass. In our case there is only one test test_advanced in that suite.

Now if we run the test scenario once again, we would have the same statuses, but the final score would be: 100.0:

Test class	Test name	Test result	Assigned score
SimpleCalculatorTests	test_add_zeros	OK	+1
SimpleCalculatorTests	test_mul_zeros	OK	+1
SimpleCalculatorTests	test_add_operation	FAILURE	-1
SimpleCalculatorTests	test_mul_operation	FAILURE	-1
AdvancedCalculatorTests	test_advanced	OK	+100

You see problem with that. If we specify positive scores for all tests, the default failure score -1 wouldn't be helpful. Intuitively we would expect the same values of negative and positive scores for test cases.

We alter the configuration to also specify granular negative scores for failures:

{
  "tests_weights": [
    {
      "name": "test_add_zeros",
      "weight": 1
    },
    {
      "name": "test_add_zeros",
      "status": "failure",
      "weight": -1
    },
    {
      "name": "test_mul_zeros",
      "weight": 1
    },
    {
      "name": "test_mul_zeros",
      "status": "failure",
      "weight": -1
    },
    {
      "name": "test_add_operation",
      "weight": 20
    },
    {
      "name": "test_add_operation",
      "status": "failure",
      "weight": -20
    },
    {
      "name": "test_mul_operation",
      "weight": 20
    },
    {
      "name": "test_mul_operation",
      "status": "failure",
      "weight": -20
    },
    {
      "classname": "AdvancedCalculatorTests",
      "weight": 100
    },
    {
      "classname": "AdvancedCalculatorTests",
      "status": "failure",
      "weight": -100
    }
  ]
}

Now the behavior is more intuitive. The final score is 62.0:

Test class	Test name	Test result	Assigned score
SimpleCalculatorTests	test_add_zeros	OK	+1
SimpleCalculatorTests	test_mul_zeros	OK	+1
SimpleCalculatorTests	test_add_operation	FAILURE	-20
SimpleCalculatorTests	test_mul_operation	FAILURE	-20
AdvancedCalculatorTests	test_advanced	OK	+100

Step 4: Provide some example, not-rated test cases

You may look at test_add_zeros and test_mul_zeros and decide that those are too simple to even be scored. We would like to change their scores to 0. User will still be able to run those tests and use them to debug their application, so they still be useful.

Instead of changing weights to zeros, we can remove entire rules add default case in our config:

{
  "tests_weights": [
    {
      "name": "test_add_operation",
      "weight": 20
    },
    {
      "name": "test_add_operation",
      "status": "failure",
      "weight": -20
    },
    {
      "name": "test_mul_operation",
      "weight": 20
    },
    {
      "name": "test_mul_operation",
      "status": "failure",
      "weight": -20
    },
    {
      "classname": "AdvancedCalculatorTests",
      "weight": 100
    },
    {
      "classname": "AdvancedCalculatorTests",
      "status": "failure",
      "weight": -100
    },
    {
      "weight": 0
    }
  ]
}

In this case the tests mentioned in the config will have positive (or negative scores) and everything else will fallback to the last selector with weight 0. This means that all tests not directly specified here will have score zero.

The final score now is: 80.0:

Test class	Test name	Test result	Assigned score
SimpleCalculatorTests	test_add_zeros	OK	0
SimpleCalculatorTests	test_mul_zeros	OK	0
SimpleCalculatorTests	test_add_operation	FAILURE	-20
SimpleCalculatorTests	test_mul_operation	FAILURE	-20
AdvancedCalculatorTests	test_advanced	OK	+100

Step 5: Simplify configuration by improved test suites

As a bonus we can simplify the weights specification by grouping tests in a better way. You probably noted that we can not only provide weights for all test cases, but also match groups of them using class names:

{
  "classname": "AdvancedCalculatorTests",
  "weight": 100
}

We will now edit test.cpp to rename the test classes:

/** ... the rest of the file **/

// Changed class name SimpleCalculatorTests ~> BaseCalculatorTests
TEST(BaseCalculatorTests, test_add_zeros) {
  EXPECT_EQ(add(0, 0), 0) << "Failed to add 0+0";
}

// Changed class name SimpleCalculatorTests ~> BaseCalculatorTests
TEST(BaseCalculatorTests, test_mul_zeros) {
  EXPECT_EQ(mul(0, 0), 0) << "Failed to mul 0*0";
}

// Left the class name as it was
TEST(SimpleCalculatorTests, test_add_operation) {
  EXPECT_EQ(add(1, 2), 3) << "Failed to add 1+2";
  EXPECT_EQ(add(10000, 10000), 20000) << "Failed to add 10000+10000";
  EXPECT_EQ(add(1, -1), 0) << "Failed to add 1+(-1)";
  EXPECT_EQ(add(-1, 1), 0) << "Failed to add (-1)+1";
  EXPECT_EQ(add(-1, -1), -2) << "Failed to add (-1)+(-1)";
}

// Left the class name as it was
TEST(SimpleCalculatorTests, test_mul_operation) {
  EXPECT_EQ(mul(123456, 1), 123456) << "Failed to mul 123456*1";
  EXPECT_EQ(mul(1, 123456), 123456) << "Failed to mul 1*123456";
  EXPECT_EQ(mul(0, 123456), 0) << "Failed to mul 0*123456";
  EXPECT_EQ(mul(123456, 0), 0) << "Failed to mul 123456*0";
}

/** ... the rest of the file **/

Now we can simplify our selectors to match entire test class of SimpleCalculatorTests at once:

{
  "tests_weights": [
    {
      "classname": "SimpleCalculatorTests",
      "weight": 20
    },
    {
      "classname": "SimpleCalculatorTests",
      "status": "failure",
      "weight": -20
    },
    {
      "classname": "AdvancedCalculatorTests",
      "weight": 100
    },
    {
      "classname": "AdvancedCalculatorTests",
      "status": "failure",
      "weight": -100
    },
    {
      "weight": 0
    }
  ]
}

The final score will be the same as before 80.0. Only thing that will change are the names of reported test classes:

Test class	Test name	Test result	Assigned score
BaseCalculatorTests	test_add_zeros	OK	0
BaseCalculatorTests	test_mul_zeros	OK	0
SimpleCalculatorTests	test_add_operation	FAILURE	-20
SimpleCalculatorTests	test_mul_operation	FAILURE	-20
AdvancedCalculatorTests	test_advanced	OK	+100

Using classname is a preferred way of assigning weights to test cases, as it's easier to maintain, however depending on specific use case, you can do what you assume to be the best solution.

If you want to learn more about score specification for tests, please read detailed documentation of WeightedScoreTestsGrader.