Score an entire competition (or a whole AOI) using cw-eval
¶
This recipe describes how to run evaluation of a proposal CSV for an entire competition against a ground truth CSV.
Things to understand before starting¶
When we score entire competitions, we want to ensure that competitors provide submissions for the entire area of interest (AOI), not just the subset that competitors provide scores for, in case they leave out chips that they can’t predict well. Therefore, proposal files scored using this pipeline should contain predictions for every chip in the ground truth CSV. The score outputs also provide chip-by-chip results which can be used to remove non-predicted chips if needed.
When CosmiQ Works runs competitions in partnership with TopCoder, we set some cutoffs for scoring buildings:
- An IoU score of > 0.5 is required to ID a building as correctly identified.
- Ground truth buildings fewer than 20 pixels in extent are ignored. However, it is up to competitors to filter out their own small footprint predictions.
Imports¶
For this test case we will only need cw_eval
installed - Installation instructions for cw_eval
[1]:
# imports
import os
import cw_eval
from cw_eval.challenge_eval.off_nadir_dataset import eval_off_nadir # runs eval
from cw_eval.data import data_dir # get the path to the sample eval data
import pandas as pd # just for visualizing the outputs in this recipe
Ground truth CSV format¶
The following shows a sample ground truth CSV and the elements it must contain.
[2]:
ground_truth_path = os.path.join(data_dir, 'sample_truth_competition.csv')
pd.read_csv(ground_truth_path).head(10)
[2]:
ImageId | BuildingId | PolygonWKT_Pix | PolygonWKT_Geo | |
---|---|---|---|---|
0 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 0 | POLYGON ((476.88 884.61, 485.59 877.64, 490.50... | 1 |
1 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 1 | POLYGON ((459.45 858.97, 467.41 853.09, 463.37... | 1 |
2 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 2 | POLYGON ((407.34 754.17, 434.90 780.55, 420.27... | 1 |
3 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 3 | POLYGON ((311.00 760.22, 318.38 746.78, 341.02... | 1 |
4 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 4 | POLYGON ((490.49 742.67, 509.81 731.14, 534.12... | 1 |
5 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 5 | POLYGON ((319.28 723.07, 339.97 698.22, 354.29... | 1 |
6 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 6 | POLYGON ((466.49 709.69, 484.26 696.45, 502.59... | 1 |
7 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 7 | POLYGON ((433.84 673.34, 443.90 663.96, 448.70... | 1 |
8 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 8 | POLYGON ((459.24 649.03, 467.38 641.90, 472.84... | 1 |
9 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 9 | POLYGON ((403.55 643.50, 416.98 630.51, 440.36... | 1 |
Important points about the CSV format:
- The column denoting the chip ID for a given geospatial location must be titled
ImageId
. - The column containing geometries must be in WKT format and should be titled
PolygonWKT_Pix
. - The
BuildingId
column provides a numeric identifier sequentially numbering each building within each chip. Order doesn’t matter. - For chips with no buildings, a single row should be provided with
BuildingID=-1
andPolygonWKT_Pix="POLYGON EMPTY"
.
Proposal CSV format¶
[3]:
proposals_path = os.path.join(data_dir, 'sample_preds_competition.csv')
pd.read_csv(proposals_path).head(10)
[3]:
ImageId | BuildingId | PolygonWKT_Pix | Confidence | |
---|---|---|---|---|
0 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 0 | POLYGON ((0.00 712.83, 158.37 710.28, 160.59 6... | 1 |
1 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 1 | POLYGON ((665.82 0.00, 676.56 1.50, 591.36 603... | 1 |
2 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 0 | POLYGON ((182.62 324.15, 194.25 323.52, 197.97... | 1 |
3 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 1 | POLYGON ((92.99 96.94, 117.20 99.64, 114.72 12... | 1 |
4 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 2 | POLYGON ((0.82 29.96, 3.48 40.71, 2.80 51.00, ... | 1 |
5 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 0 | POLYGON ((476.88 884.61, 485.59 877.64, 490.50... | 1 |
6 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 1 | POLYGON ((459.45 858.97, 467.41 853.09, 463.37... | 1 |
7 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 2 | POLYGON ((407.34 754.17, 434.90 780.55, 420.27... | 1 |
8 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 3 | POLYGON ((311.00 760.22, 318.38 746.78, 341.02... | 1 |
9 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | 4 | POLYGON ((490.49 742.67, 509.81 731.14, 534.12... | 1 |
The only difference between the ground truth CSV format and the prediction CSV format is the Confidence
column, which can be used to provide prediction confidence for a polygon. Alternatively, it can be set to 1 for all polygons to indicate equal confidence.
Running eval on the Off-Nadir challenge: Python API¶
cw-eval
currently contains code for scoring proposals from the Off-Nadir Building Detection challenge. There are two ways to run scoring: using the Python API or using the CLI (see later in this recipe). The below provides an example using the Python API.
If you provide proposals and ground truth formatted as described earlier, no additional arguments are required unless you would like to alter the default scoring settings. If so, see the API docs linked above.
The scoring function provides two outputs:
results_DF
, a summary Pandas DataFrame with scores for the entire AOI split into the nadir/off-nadir/very off-nadir binsresults_DF_Full
, a DataFrame with chip-by-chip score outputs for detailed analysis. For large AOIs this function takes a fair amount of time to run.
[4]:
results_DF, results_DF_Full = eval_off_nadir(proposals_path, ground_truth_path)
100%|██████████| 33/33 [00:14<00:00, 2.11it/s]
[5]:
results_DF
[5]:
F1Score | FalseNeg | FalsePos | Precision | Recall | TruePos | |
---|---|---|---|---|---|---|
nadir-category | ||||||
Nadir | 1.0 | 0 | 0 | 1.0 | 1.0 | 2319 |
(This ground truth dataset only contained nadir imagery, hence the absence of the other bins)
[6]:
results_DF_Full.head(10)
[6]:
F1Score | FalseNeg | FalsePos | Precision | Recall | TruePos | imageID | iou_field | nadir-category | |
---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 0 | 0 | 1.0 | 1.0 | 96 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
1 | 1.0 | 0 | 0 | 1.0 | 1.0 | 3 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
2 | 1.0 | 0 | 0 | 1.0 | 1.0 | 43 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
3 | 1.0 | 0 | 0 | 1.0 | 1.0 | 67 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
4 | 1.0 | 0 | 0 | 1.0 | 1.0 | 3 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
5 | 1.0 | 0 | 0 | 1.0 | 1.0 | 91 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
6 | 1.0 | 0 | 0 | 1.0 | 1.0 | 80 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
7 | 1.0 | 0 | 0 | 1.0 | 1.0 | 96 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
8 | 1.0 | 0 | 0 | 1.0 | 1.0 | 112 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
9 | 1.0 | 0 | 0 | 1.0 | 1.0 | 78 | Atlanta_nadir8_catid_10300100023BC100_743501_3... | iou_score | Nadir |
Running eval on the Off-Nadir Challenge using the CLI¶
The cw-eval
CLI allows competition scoring without even needing to open a Python shell. Its usage is as follows:
$ spacenet_eval --proposal_csv [proposal_csv_path] --truth_csv [truth_csv_path] --output_file [output_csv_path]
Argument details:
--proposal_csv
,-p
: Path to the proposal CSV. Required argument. See the API usage details above for CSV specifications.--truth_csv
,-t
: Path to the ground truth CSV. Required argument. See the API usage details above for CSV specifications.--output_file
,-o
: Path to save the output CSVs to. This script will produce two CSV outputs:[output_file].csv
, which is the summary DataFrame described above, and[output_file]_full.csv
, which contains the chip-by-chip scoring results.
Not implemented yet: The CLI also provides a --challenge
command, which is not yet implemented, but will be available in future versions to enable scoring of other SpaceNet challenges.
Example:
[7]:
%%bash -s "$proposals_path" "$ground_truth_path" # ignore this line - magic line to run bash shell command
spacenet_eval --proposal_csv $1 --truth_csv $2 --output_file results # argument values taken from magic line above
F1Score FalseNeg FalsePos Precision Recall TruePos
nadir-category
Nadir 1.0 0 0 1.0 1.0 2319
Writing summary results to result.csv
Writing full results to result_full.csv
100%|██████████| 33/33 [00:17<00:00, 1.16it/s]