-
Story
-
Resolution: Fixed
-
Critical
-
0.4
-
None
-
None
-
SIdora Sprint 12, SIdora Sprint 13, SIdora Sprint 14, SIdora Sprint 15, SIdora Sprint 16, SIdora Sprint 17
-
237
-
8
We need a program (or programs) that can parse a tabular data file to determine the number, type and value ranges of variables in the file. This program would be used to initiate a new codebook so that that user would have something to start with, but also should be able to be used to validate new files to be associated with the codebook.
This is the back-end web service and uses the schema developed by Adam Soroka. The Workbench will need to use the schema for display and management of tabular data. Also, a UI is needed to create instances (SID-302) of the schema with specific column specifications.
An example can be found in app03 and is named codebook 2015. Only cursory tests using the form builder have been done. it will need to be fully tested once it has been hooked up as part of the work on this issue.
Possible uses:
1. The Tabular Data wizard could use this program to initiate the codebook based on the first file in a batch to be uploaded with a new codebook. The user would be presented with a codebook form that already shows the correct number of variables.
If there are labels in the first row of the file they would already be filled in. If the type of data could be determined (string, numeric, date, place) that would be filled in as well. Ideally, a really smart program might even be able to identify the encoding of the date and place.
2. A batch of files that were uploaded could be parsed to determine whether they were valid for the codebook after ingest. This would probably be too time/resource consuming to be done before letting a user upload them. A message could be sent to the user telling them that they have objects that aren't legit. They could be restricted from using the objects in workflows until they fix them.
http://si-vmatlassian.si.edu:8090/display/ORIS/Improve+tabular+data+functionality