Official Event of VIS 2013
| |||
The IEEE 2013 SciVis Contest is an official event of the IEEE VIS 2013 conference, which will be held in Atlanta, GA (October 13 - 18, 2013). This year's contest will target the exciting domain of developmental neuroscience. Specifically, contest participants will help identify spatial and temporal gene expression patterns in the developing mouse brain.
By participating in this contest, you can demonstrate your novel visualization and interaction techniques and simultaneously help to profile genes functionally relevant to brain development or developmental brain disorders.
The Allen Institute for Brain Science has made the Allen Developing Mouse Brain Atlas data set available for use for the contest, but the Institute is not an official sponsor of the contest.
IMPORTANT UPDATE - The data set has changed! See the News section below for details. IMPORTANT UPDATE 2 - The contest deadline has been extended! You now have until September 2 to make your submission. See the Submission section for details. If you are planning to make a submission, please email the contest organizers at the addresses listed in the Contact section.
The data set tracks the level of gene expression for ~2000 genes in a 3D mouse brain from embryonic stages through adulthood. These expression levels are recorded within annotated 3D regions that change size and shape (and even divide) during development. The genes are organized into 11 categories.
The ~2000 genes in the Atlas are characterized by in situ hybridization (ISH) across three embryonic and three early postnatal ages. ISH is a technique for labeling cells expressing a particular mRNA sequence. In ISH parlance, a probe is the RNA sequence hybridized to a tissue specimen, but for practical purposes a probe usually only covers a subset of the entire gene sequence. As a result, there can be multiple probes, and therefore multiple image volumes, for a single gene. Brain sections are treated and imaged at high resolution, at which point an automatic cell segmentation algorithm quantifies regional expression.
Expression levels of different genes can only be compared to each other when the images have been registered to a common reference space. Because the morphology of the developing mouse brain varies dramatically from stage-to-stage, each stage has its own reference space. To generate a reference space:
ISH images and expressing cell segmentations are similarly co-registered into a 3D volume, which is then directly aligned with reference volume for that specimen's developmental stage. The expression values released in the contest data set are a measure of the sum of expressing pixel intensities within a reference space voxel divided by the total number of pixels in that voxel. This is called expression energy.
Comparing expression values across developmental stages is a key part of the visualization challenge. Individual voxels cannot be compared across stages, however structure-level statistics can be. One can compare the expression energies between entire stuctures, rather than just a single voxel. This is also non-trivial, however. The ontology of developing mouse brain structures is not only hierarchical, but also time-varying. In the earliest stages of development, the mouse brain has only a few gross structures, which gradually divide into more complex structures over time. These temporal subdivisions are called levels in the data. How do you compare two developmental stages to each other when they don't have the same set of structures? This is one area we hope that you will help.
To summarize, the data set consists of:
Expression energy volumes, reference atlas volumes, and annotation volumes are stored as Meta Images (.mhd), which is readily readable by VTK, ITK, and applications like ParaView which are built atop these toolkits. Note that the reference atlas volumes have much larger dimensions than the energy and annotation volumes. Taking into account the physical spacing between voxels, however, all of the volumes have the same (physical) size. Spacing values listed in the Meta Image headers are all in micrometers.
Some voxels in the expression energy volumes have a value of "-1". This value indicates that there was an error in the ISH processing pipeline and that there is no data in that voxel. A negative value was chosen to distinguish between "zero energy" and "error".
The structure-level expression energy values are stored in a separate CSV file per probe. Each row of the CSV describes the combined expression energy of all voxels within a labeled structure. The number of structures will differ from probe to probe because a) the number of structures varies between stages and b) no data was successfully captured in that structure. A missing structure can only be interpreted as "no data", not "no energy".
If you would like to use the data set outside the scope of this contest, you are agreeing to bound to the Allen Institute's terms of use found here .
The Allen Developing Mouse Brain Atlas was developed to help the neuroscience community investigate how gene expression changes throughout the process of development in the mouse brain. The contest will focus on answering the following questions, in decreasing order of importance:
Gene expression energy volumes, annotation volumes, and reference volumes are being released as Meta images (.mhd), which are readable by VTK, ITK, and other common software libraries. The ontology, gene meta data, and categories, are stored in simple CSV files. All of this is being graciously hosted on the SDSC cloud. Gene meta data is listed on a probe-by-probe basis, so be aware that there can be multiple probes for a single gene. Visit this link to download the data:
https://cloud.sdsc.edu/v1/AUTH_sciviscontest/2013
Linux users can run this simple wget command:
$ wget -r -nH --no-parent --no-check-certificate https://cloud.sdsc.edu/v1/AUTH_sciviscontest/2013/
UPDATE (July 1) - With the release of the updated data announced in the News section, you'll see that the SDSC cloud link contains a data, data_v2, and data_v3 directory. We recommend that you only download one of these directories, as they are quite large. You could modify your wget command as follows:
$ wget -r -nH --no-parent --no-check-certificate https://cloud.sdsc.edu/v1/AUTH_sciviscontest/2013/data_v3/
All data released by the Allen Institute, including the data in this contest, was retrieved using the Institute's public API. The python scripts that download this data set are available on github:
https://github.com/AllenBrainAtlas/visweek-2013-contest
The github source code also contains a helpful script or demonstrating how to read the ontology, how to parse gene meta data, and perform other useful tasks. It is recommended to use this code as a starting point for getting a better understanding of the data set.
When you download the data set, you will find a set of directories:
meta/data_sets.csv is the index for all of the volume data sets. Each row is a probe for one developmental stage (reference space), and the columns describe which gene and development stage (reference space) the probe targets. meta/structures.csv lists all of the structures in the ontology as well as their hierarchical relationship. Each structure row has a structure_database_id_path column that describes the lineage of the structure all the way back to the root.
Because all of the gene expression energy volumes are MHDs, you can open them in ParaView. You can open an energy volume and its associated reference atlas volume or annotation volume and they should align. You may notice that energy only covers slightly more than one hemisphere of the brain -- this was intentional. Only one hemisphere was sectioned for ISH treatment.
When this page changes we'll update this section and notify email addresses registered with the change notification widget on the left. There will also be a FAQ in this section as necessary.
The Allen Institute has reprocessed the developing mouse atlas. It has been packaged for the contest and you can download it with the instructions in the Download section. The big difference is the addition of a P28 time point, which has the same structural annotation ontology as the rest of the time points.
The Institute has also reannoated the P56 time point with a matching ontology, however the P56 imaging modality is different from the other time points and the expression energy values are scaled differently. To keep things simple, I'm leaving out P56. If you're curious and want to know how to download it, contact the mailing list.
Participants should feel no pressure to try out the new data set, although new participants are recommended to use the latest and greatest.
Several participants have noticed that there are a large number of files in the structure_unionizes directory that have no corresponding data sets. That is correct, and it is an error.
If you read the note below, you'll remember that we are in the process of re-annotating this data set for release in June. As it turns out, some of that data has already been published, and I inadvertently included it in the unionize directory.
Summary: disregard all unionize files that do not have corresponding data sets in the data_sets directory. Sorry for the confusion!
The gene categories in this data set come from the PANTHER gene database. When you look at the classifications column in data_sets.csv, you'll see a forward-slash-delimited set of classification names. A single data set can be associated with more than one category.
The Allen Institute periodically refreshes the gene categories in its internal database. The gene categories were updated just before version 2 of the data set was released (see comment below), so you'll see a slightly larger number of categories in the new data set (32).
A participant has kindly notified us that there is a problem with the P56 time point. If you look at the ID numbers in the P56 annotation volume (annotation/annotation_10.mhd), you'll see that it has a totally different set of structure IDs than the other time points, and these IDs are not contained in meta/structures.csv. These structure IDs are from a different atlas (the Allen Institute's adult mouse gene expression atlas). The P56 structure IDs cannot be used for the developing mouse contest data.
We appologize for this error, and are working to fix it. For now, we've released a second version of the data set that does not have the P56 time point. If you browse the data set from the SDSC cloud storage link, you'll see it now contains a new directory data_v2, which does not contain the P56 time point. We've updated the download instructions accordingly.
The Allen Institute is planning to release an updated version of the developing mouse atlas in June of this year. The gene expression volumes will remain largely unchanged, however the ontology is being overhauled. This means new structure names, new structure IDs, and new annotation volumes. The P56 time point is being re-annotated to use the same ontology as the other developmental time points.
We would love to see how your visualization technique applies to all 6 refreshed time points! However, we understand that 2 major data set changes is a hassle, so we only ask that your submissions work on one of the released data sets. If you would like to "future-proof" your code, avoid hard-coding file names and structure IDs as much as possible.
Thank you for your patience. Again we appologize for the error.
VisWeek has been officially renamed to VIS. The page has been updated accordingly.
Updated the awards list to include magazine article submission. Contact emails are correct now. Updated terms of use in the data description.
To demonstrate your approach, you are expected to submit:
The winning team will receive:
Depending on the number of submissions and the reviews, honorable mentions will be awarded. The honorable-mention teams will receive award certificates for each team member.
All submissions will be published in the electronic proceedings of the IEEE VIS 2013. We reserve the right to exclude submissions which are of bad quality and which are not useful in any part.
You can always contact the chairs at this email address: scivis_contest ieeevis org. Questions appropriate for all contest participants can go here: scivis_contest_participants ieeevis org. If you would like to join the participants mailing list, just email one of the two mailing lists.
If you have any questions about the data, submission procedure, or anything else, please do not hesitate to contact us.
Chair: Gabriel Zachmann, University of Bremen, Germany
Co-Chair: David Feng, Allen Institute for Brain Science. davidf alleninstitute org
Web and data hosting is provided by courtesy of SDSC. Thanks a lot to Amit Chourasia and Jan Klein, SciVis Contest Advisors, for their continuous support.