Current increases in crop yield may not meet future demands for food in fibre. Crop biomass has been proposed as an effective target for yield increase in temperate crops like wheat. Although measurements of crop biomass are predictors of plant growth and yield, traditional measurements are labour intensive, costly, and not amenable for large screens in breeding programs. For example, a plot section is manually cut, and the tissue is dried until constant weight in an electric oven. The resulting measured dry weight per area is compared among hundreds or thousands of lines to find "winning" varieties. A new, faster and non-invasive approach is to use the information stored in a point cloud from a LiDAR sensor of cereal plot and predict biomass using different machine learning algorithms. However, the models need to be fed with an actual, ground-truth dry weight of aboveground biomass data. The current dataset is a collection of dry weight data from a field trial specifically designed to encompass variation in canopy architecture, biomass and height to feed into models to predict biomass using LiDAR. This variation was achieved by using different varieties of two crops, triticale and wheat, sown at two different rates. Moreover, LiDAR and dry weight measurements were performed at two different developmental stages to enlarge the "trait space".
Lineage: Twenty-six varieties of small grain cereals (Triticum aestivum [wheat] and triticale) were sown at two different densities to generate a range of aboveground biomass values and plant architecture. Seeding rates were 250 ('high', standard Australian yield trials) and 50 ('low') seeds/m2. Both sub-trials were sown on 22/05/2019 in Yanco, NSW (2019), and each crop variety was randomly replicated for each sowing density, giving 78 plots per sub-trial, 156 plots in total. Each plot was scanned with a LiDAR sensor mounted on the phenoMobile-Lite platform at two different time points: early (28/08/2019, vegetative stage) and later (02/102019, flowering stage) in the season. Point clouds were generated with LiDAR mounted on a PhenoMobile-Lite driven above all plots at 2 m height. Aboveground biomass was collected after each scanning with LiDAR by cutting plants with steak knives. The size of the "quad" cuts in each plot was 1 m2 at the early stage and 0.5 m2 at the flowering stage. After cutting, the plant tissue was oven-dried at 80ºC for seven days until constant dry weight. Then, dry weight was measured using a scale to estimate aboveground biomass on an area basis (g/m2) for each cut. The whole plot (10 m2 in area) biomass was estimated based on the quad cut dry weight per area and related to the entire plot LilDAR point cloud to build the prediction models. The complete dataset is composed of the following files and folders:
1) Ground_truth_data_final: this file contains the experimental layout information and the dry weigh measured on area basis. runNo, number of run (column); rangeNo, number of range (row); entryNo19, number of the varieity; stage, developmental stage (Z31, early; Z65, flowering); trt, treatment refers to sowing density (Hi, 250 seeds m-2); Lo, 50 seeds m-2); biomass_g_mSq, measured dry biomass in g m-2.
2) test_list.txt: list of 102 files with LiDAR information used as a "test" set.
3) train_list.txt: list of 204 files with LiDAR information used as a "training" set.
4) Folder "Yanco_TC_2019_HI-pcd. This folder tree contains files with raw LiDAR information (.pcd). The files are organized in two different folders by date: 20190828 & 20191002. Each folder has 13 subfolders with LiDAR information for individual plots in each of the 13 runs (experiment âcolumnsâ). The files are named as "runNo-rangeNo-â¦.pcd". For example. The file located at â20191002//Tony e-w_20091002_005//5-6-1-b.pcdâ can be interpreted LiDAR data corresponding to plot: runNo 5, rangeNo 6 taken on 02/10/2019.