The Voting information Project works with states to generate XML files of voter information that can easily weigh in at five hundred megabytes. These files are generated by IT resources in each state and need to conform to the Voting Information Data Specification. The issues this sort of distributed generation brings up range from large file handling and processing, end-user (generator) feedback, and automated syntactic and semantic validation of XML.
Alone, many of these items are non-trivial obstacles to overcome. As a whole, these issues can add up to seriously complicate any project. This approach will work through the process at a fairly high level from a technology-centric viewpoint, describing particular technologies and methodologies that can be employed to complete each phase of data generation, delivery, and validation.
Accepting a file of such large size utilizing a standard upload form over HTTP is possible and, given the possibility of fast connection on both the state office side and the VIP-hosted tools side, quite acceptable. Adobe’s Flash can be used for the client-side file browsing, upload kick-off, and status tracking in a manner consistent and reliable across browsers.
On the server side, any number of technologies can be used to accept the incoming file without timing out or corrupting the data in the process. The practice of accepting incoming data in chunks and streaming it to disk, rather than holding it in memory, can make incredibly large file uploads an easily attainable goal. This would allow the possible size of state data files to reach, and perhaps even exceed, five gigabytes, while still being managed within the same system and application.
Automation is the desire of any quality application developer or system engineer. If a state has the resources and know-how available to automatically retrieve the information and then their XML file, they might also want to submit it following the build process. Accepting incoming files, returning the files’ unique ID, and sending validation information or errors back to the state user’s application could be a useful way to allow further automation of the validation and document update process. If a state’s system can update the information once a week, or perhaps, once a day, the information provided by the Voter Information Project can look to be as fresh as possible for consumer applications.
The API built by the Voting information Project to handle this administrative update process might take a different form than the API that a developer-as-consumer might desire. In this case an XML information return would be a bit more valuable and be more applicable to the standard use case. Whereas, a developer looking to consume VIP data might be working within a server or client-side technology and so would get more mileage out of something like JSON. This may, as well, make the barrier to application creation significantly lower for beginning developers utilizing today’s popular JavaScript frameworks with their built-in functions expecting JSON formats.
When the Voting information Project service looks to validate the contained XML and data upon receipt, such a large dataset will require processor cycles that make utilization of shared or entry-level servers a poor substitute for collocated servers or cloud-based processing-specific utilities.
A cloud-based system can be used to process the data files received from client states. File submission will occur at irregular intervals and so a cloud-based system will allow scaling of processing power so that file submissions, no matter the number or size of submissions, can be turned around in a short amount of time. A collocated, single-server solution, would have to utilize queuing to handle this same workload. During moments of high or frequent submissions, a queuing-based system would develop a significant backlog of process requests.
Cloud-based services such as EC2 from Amazon or Cloud Servers from Rackspace charge on a time-unit basis and offer APIs for automatically starting up server instances and configuring services. These servers can be spun up, and paid for, only when a client state submits new data.
Syntactic validation is actually the lesser of the concerns within this entire process. Validation tools available, such as lxml, can take the XSD files already created by the Voter Information Project and ensure that the data submitted by constituent states meets the requirements of the XSD, and, to a lesser extent, the specification itself.
Semantic validation is the truly difficult portion of this project to work through and solve. The more automated semantic checks that can be completed on the data upon receipt, the more likely that consumer error reports can be limited. In this way, incomplete or potentially conflicting data can be rejected upon receipt from a constituent state and that state’s processes can be notified to correct and submit a new data set immediately. This sort of short turnaround in data set error reporting could be very valuable, especially in the weeks leading up to an election.
There are three specific technologies, ORA-SS, Schematron, and OCL/UML, which could be used to define a semantic specification. The received data could then be compared to these semantic specifications in an automated fashion. As ORA-SS & OCL/UML validation tools are few and far between and techniques are mostly the realm of academia, Schematron would be the format of choice for this particular portion of the project. Transforming the Schematron source files into XSLT files can allow for the performance of semantic validation of an XML file. This way any standard XML manipulation tool can be used for the state data set semantic validation portion of the process.
There is a valuable service that could be implemented following successful deployment of a fairly comprehensive syntactic and semantic validator for Voter Information Project datasets. This service would be the ability for the receiving party to auto-correct any discovered syntactic or semantic issues possible within a scope of correction that does not require human intervention or decision.
In this manner, the Voting information Project could have a third state beyond “file valid and accepted” or “file invalid and rejected”. This third state would return a message of “file invalid but accepted with the following corrections performed”. The state data set providers could determine the threshold at which they decide to provide cleaner or “more correct” data. This would probably be a decision made based on whether or not the system is providing the proper fixes to the data set.
By following a measured and reasoned approach, this quite large and complex process can be moved forward and goals met by the members of the Voting information Project and its constituent states. As this process is further refined, lowering the barriers to technical entry to the project, more states can be brought into the constituent states group with greater ease.

wdWLDX <a href=“http://oxyamlsqphkb.com/”>oxyamlsqphkb</a>, [url=http://xcsowawzobkp.com/]xcsowawzobkp[/url], [link=http://roaohcmfdtzl.com/]roaohcmfdtzl[/link], http://nzgixfnazbml.com/
— emvgocyheow · Jan 4, 05:17 AM · #