Morning Hubski! Welcome to a Thursday edition.
We're still trying to get the configuration settings for our new "sync" (think SQL replication without all the benefits of having it done for you) functionality pinned down tight. Which essentially means that you have have to go each table, column by column and determine how a 'key' can be derived from the data inside (its really tricky! Sometimes you can forget exactly what users can change and your left with a bunch of 'dupes' once your keys go bad). Luckily, I've been doing this kind of thing in the database now for over eight months; I've becoming pretty good at being able to eye what a good key will be. Duplicates are a man made construct anyway.
Bugs have been steadily rolling in. We went live with around 20 rigs in the past four months. After doing my little part for the overall sync configuration, I've been tracking this one bug that is spewing orphaned records into a couple of tables at a pretty high rate. As of right now I suspect 1/3rd (530,000 some odd rows) of the table is actually garbage. I plan to exterminate the bad data, and correct the module(s) that are causing it. I think its just a stored procedure with a couple of joins that are not explicit enough.
On the side, I'm working on a database driven web scraper that I'm hoping will be able to be pretty dynamic and configurable. The overall idea being that you'd be able to configure the scraper to go to websites,and it will go and try to scrape out whatever you've defined as the domain model for it. If the website changes, no big deal, just go back and re-evaluate your xpaths and redefine the node commands for it and viola, its back up and running. I'm hoping that I can spring up dynamic restful services to serve as an ad hoc API for websites that don't have one. It has a long way to go.
What are you working on today?
Rather to my surprise, I wrote a hundred lines of code between getting up and going to work this morning. I'm building an environment to help teach my students programming, which consists of panels where you can define functions on the left, and a repl-like thingy on the right where you can call functions, see their results, click on the results to expand a trace of computations, zoom in and out of the trace. Eventually I hope to be able to make an edit to a function and automatically rerun all the commands in the 'session', flagging any changes or errors. Automatic unit tests for a fraction of the effort.
As of this morning, the left side is done. Screenshot:
I'm going to start wiring up the right side next. (The repl and trace browser are already done as standalone commands.)