Cityscapes. Urban scenes segmentation

Organized by buntar29 - Current server time: July 9, 2020, 2:57 p.m. UTC


Instance-Level Semantic Labeling Task
July 1, 2019, midnight UTC


Additional phase
Sept. 7, 2019, midnight UTC


Competition Ends


The Cityscapes Dataset focuses on semantic understanding of urban street scenes. In the following, we give an overview on the design choices that were made to target the dataset’s focus.


Type of annotations

  • Semantic
  • Instance-wise
  • Dense pixel annotations



  • 50 cities
  • Several months (spring, summer, fall)
  • Daytime
  • Good/medium weather conditions
  • Manually selected frames
    • Large number of dynamic objects
    • Varying scene layout
    • Varying background


  • 5 000 annotated images with fine annotations (examples)
  • 20 000 annotated images with coarse annotations (examples)

Benchmark suite and evaluation server

  • Pixel-level semantic labeling
  • Instance-level semantic labeling

Evaluation Criteria

Pixel-Level Semantic Labeling Task

The first Cityscapes task involves predicting a per-pixel semantic labeling of the image without considering higher-level object instance or boundary information.


To assess performance, we rely on the standard Jaccard Index, commonly known as the PASCAL VOC intersection-over-union metric IoU = TP ⁄ (TP+FP+FN) [1], where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively, determined over the whole test set. Owing to the two semantic granularities, i.e. classes and categories, we report two separate mean performance scores: IoUcategory and IoUclass. In either case, pixels labeled as void do not contribute to the score.

It is well-known that the global IoU measure is biased toward object instances that cover a large image area. In street scenes with their strong scale variation this can be problematic. Specifically for traffic participants, which are the key classes in our scenario, we aim to evaluate how well the individual instances in the scene are represented in the labeling. To address this, we additionally evaluate the semantic labeling using an instance-level intersection-over-union metric iIoU = iTP ⁄ (iTP+FP+iFN). Again iTP, FP, and iFN denote the numbers of true positive, false positive, and false negative pixels, respectively. However, in contrast to the standard IoU measure, iTP and iFN are computed by weighting the contribution of each pixel by the ratio of the class’ average instance size to the size of the respective ground truth instance. It is important to note here that unlike the instance-level task below, we assume that the methods only yield a standard per-pixel semantic class labeling as output. Therefore, the false positive pixels are not associated with any instance and thus do not require normalization. The final scores, iIoUcategory and iIoUclass, are obtained as the means for the two semantic granularities.

Instance-Level Semantic Labeling Task

In the second Cityscapes task we focus on simultaneously detecting objects and segmenting them. This is an extension to both traditional object detection, since per-instance segments must be provided, and pixel-level semantic labeling, since each instance is treated as a separate label. Therefore, algorithms are required to deliver a set of detections of traffic participants in the scene, each associated with a confidence score and a per-instance segmentation mask.


To assess instance-level performance, we compute the average precision on the region level (AP [2]) for each class and average it across a range of overlap thresholds to avoid a bias towards a specific value. Specifically, we follow [3] and use 10 different overlaps ranging from 0.5 to 0.95 in steps of 0.05. The overlap is computed at the region level, making it equivalent to the IoU of a single instance. We penalize multiple predictions of the same ground truth instance as false positives. To obtain a single, easy to compare compound score, we report the mean average precision AP, obtained by also averaging over the class label set. As minor scores, we add AP50% for an overlap value of 50 %, as well as AP100m and AP50m where the evaluation is restricted to objects within 100 m and 50 m distance, respectively.

Terms and Conditions

Submissions must be submitted for Phase 1 before the 2019-03-14 23:59:00 Moscow time and for Phase 2 before the 2019-04-09 23:59:00 Moscow time. You may submit 20 submissions every day and 200 in total.

Pixel-Level Semantic Labeling Task

Start: Feb. 10, 2019, midnight

Description: The first Cityscapes task involves predicting a per-pixel semantic labeling of the image without considering higher-level object instance or boundary information.

Instance-Level Semantic Labeling Task

Start: July 1, 2019, midnight

Description: In the second Cityscapes task we focus on simultaneously detecting objects and segmenting them. This is an extension to both traditional object detection, since per-instance segments must be provided, and pixel-level semantic labeling, since each instance is treated as a separate label.

Additional phase

Start: Sept. 7, 2019, midnight

Competition Ends


You must be logged in to participate in competitions.

Sign In