Skip to content

Laying foundations

May 9, 2013

Well, that’s been nearly a month – nearly dropped off the edge of the world again, but now that a release and a brief holiday up north are out of the way, I’ve finally started to bash on with the solution I mentioned in the last post.

Now, I have a bit of a love/hate relationship with Scala.  I mean, the language is pretty brilliant, and has features (function passing, lenient syntax, excellent pattern matching to name a few) that I really like, I’ll always admit it can be a tad clunky.  It can be pretty slow to build for one thing, and there’s a bit of an issue with over-proliferation of class methods.  Plus… I’ve had a bad time with getting SBT – the simple build tool – scala’s homebrew answer to maven working.

But what’s life without a challenge.  So I’m going to do this properly, without falling back to pulling in all dependencies via Maven.  Fortunately, with Intellij by my side, and a clean slate to work from, I might just be able to do it.

Setting Things Up

Now, to do this properly, I want to try and keep things nice and tidy; my aim is to end up putting this up in my currently-empty github account, so we should try and make things at least comprehensible to anyone who comes across it.  So we need some kind of dependency management – I can’t help feeling it’s inefficient at best, a bit rude at worst.  So I’ll grid my lions, so to speak, and put together an SBT build script.

Here’s the ultra-basic version I’m using, with no actual dependencies defined except for the scala version (which will bring down the appropriate compile and runtime libraries automatically, as well as do funky things with scala dependencies, but later on that front);

name := “mimir”

version := “0.1”

scalaVersion := “2.10.1”

Now, one thing to bear in mind – you must leave a blank line between each property – this is SBT’s property delimiter for basic .sbt build scripts, otherwise your build will screw up in incomprehensible ways.  As written, the above script will cause a few things to happen;

  1. It’ll retrieve scala-library-2.10.1 and scala-compile-2.10.1 artifacts via Ivy from a set of standard repos.
  2. It’ll create project and target folders for you, putting resolved dependencies in the latter.
  3. If there’s a src/main/scala or src/test/scala folder already there with source in, it’ll try to compile them using the listed dependencies (currently just core java libraries and the scala-compile library).

To do this, you’ll need to run SBT itself, via the sbt-launch.jar in the root directory of your project – this is all pretty well documented here.

One further step – IDE integration.  As it stands, your IDE won’t have access to the dependencies SBT has set up for you, so it’s worth finding the SBT plugin for your preferred IDE.  I use IntelliJ these days, and you can pull in the SBT plugin by creating a file in your /project folder called plugins.sbt.  This is another build script, which kind of acts as a build script for your build.sbt.  Into this, I’ve just put;

addSbtPlugin(“com.github.mpeltonen” % “sbt-idea” % “1.4.0”)

Now when I use the sbt reload command, it’ll bring in the plugin.  I can then call gen-idea to have it generate IDEA project files, which handle the resolved dependencies.

Success!

So that’s the basic project setup done.  Next post – actual coding! 

 
 
 

if(canBeBothered) return post;

April 19, 2013

Well, it has been a while since I posted anything here, but figured I should get back into the habit at some point, and really, now is as good a time as any.  Besides, if anyone is reading this (slim chance), you might find some of the dev-based musings fairly interesting (slimmer chance; frankly, we’re talking graphene by this point).

Anyway, I’ve had a bit of an Interesting Problem recently, which I thought I should share, over a few posts.  To make things concise for now, I can summarize the events leading up to the Problem as follows;

  1. We need to store complex objects.  Arbitarily complex objects, associated with each other via a set of attributes.
  2. Each time we add an object to this store, we have to group it with its fellows.  This isn’t that hard – the linked objects are indicated quite clearly, but it does screw up any thought of simple transactions.
  3. Did I mention the number of objects?  And it really says it all that these were held in what must have been the obvious data structure at the time.  A HashMap.  Because, obviously, it’s really quick to query, right guys?  Right?
  4. Of course, we also execute complex, SQL-like queries against this, so there’s that to consider.  Of course, the easy way is to iterate through every single object and compare it to the query.  Evidently.  Oddly enough, this is surprisingly fast, given the circumstances.
  5. The end result of this is an application that… well, politely speaking, as seen better days.  It’s bloated, unwieldy, and really really slow.  -Xmx15G and OutOfMemoryException should not occur within arms reach of each other.

So, obviously, we needed a solution that was a bit better designed and a hell of a lot easier on the poor, abused virtual machine.  There were a few things here that came to mind;

  • Shifting storage responsibilities somewhere else, other than the heap.
  • Doing some kind of indexing on the store – as is,  I think my copy of Clean Code was starting to yellow and char around the edges…
  • Keeping it as quick to load on startup – no easy task given all those 5M+ entries would be loaded in batch.
  • Trying not to disrupt the model, thus causing Leviathan to arise from the depths of the JVM and the End of Days to commence.

Now, the solution is in place and working.  But obviously, that’s in the Land Of Professional Development, where fun is frowned upon, and product owners are scared of words like ‘Scala’…

So why don’t I see if I can have a bash at using just that.  With any luck I can document my attempts over the next few weeks.

Possibly.

If I can keep up some kind of posting schedule.

Back to the Roots

November 18, 2010

So it’s been a bit of a while since I last did anything on this blog.  To be precise, as near-as-makes-no-sense to eight months, which is a bit appalling, really.  To tell the truth, quite a few things have fallen by the way-side in that time, notable amongst them my in-memory-OLAP-but-not-really-OLAP implementation; by the name of Mimir.  This is based around my personal contribution to what was previously known as The Project – at least on this blog, ‘cos it didn’t really have an official name anyway.

Anyway, to re-cap, I was trying to implement this distributable, high speed, Rx-based data structure, that would mimic the interface onto a standard OLAP cube.  This structure consists of a number of Dimensions, which are grouped into Hierarachies of similar information, intersecting at a single Value for each unique combination of dimension values.  Both the Value and Dimensions are analagous to columns in a standard relational database, and the Hierarchy is an ordering of dimensions; just like you have in a standard OLAP cube.

Now, the original implementation of this was in C#, backed by an AppFabric cache for persistence and linked up to StreamInsight at the business end.  It consisted of a somewhat untidy structure, which could be subscribed to based around the location passed into the cube.  This was then wired together via the .Net Reactive Extensions.  Some of this will change.  I’ve got in mind a Mimir 2.0 implementation – major changes are as follows;

  1. Replace the AppFabric-or-Memory option with an interface-driven flexible MimirSource.  The idea is that any persistence model could be chosen and the MimirSource will sit and act as a) a delegate DAO, and b) a factory object for the appropriate CubeValues.
  2. Tidy up the configuration.  Previously everything was configured via a POCO, including some of the aggregation functions.  This will now be the duty of a cut-down config object, coupled with the MimirSource, which will handle the persistance and factory functions, along with an Evaluator which deals with value arithmetic and aggregation.
  3. Actually implement a history – which will be yet another interface-implementation.  Previously I wasn’t storing a value history of any sort, just the version tags in the CubeValue objects, but this is probably a needed feature.
  4. Expand the rehydration – previously there was some provision to reconstruct a cube when connecting to the AppFabric cache.  I want to evolve this into a more intelligent solution, loading the cube structure lazily when it’s requested.
  5. As an adjunct to the above, possibly differentiate between read-only and read-write cubes, to create a kind of client-server structure.
  6. Work on the query/subscription mechanism.  Currently it relies on simply passing in a path of a length equal to or shorter than a given hierarchy.  I want to expand this to support some degree of logical expression in the queries, to expand the applicability of the cube itself.

Tie in some general tidying up, and I think I’ve got a project for the near future.  Another one, at least.  Hopefully there’ll be more coming – watch this space.

Staring into the Abyss

March 25, 2010

So The Project continues apace, and may even see the light of day in some form or another, and I’ve pretty much finished the code for my So The Project continues apace, and may even see the light of day in some form or another, and I’ve pretty much finished the code for my hypercube. And what do I do? I have a think.

We’re already using Reactive Extensions, so I’ve got kinda familiar with building chains of Subject<T>‘s and IObservable‘s, so I figure, if we have this hypercube; which holds values of class T, why can’t we just have it hold a set of IObservable

So an hour of painful re-factoring, fixing, and general poking around and we have the DynamicCube<T> which is frankly scary.  Alright, so I haven’t tested it properly, but what we should have is a data structure that stores values corresponding to the dimensions on a hypercube as before, only, instead of updating down into the cube, we are updating values from the bottom up.  Subscriptions to sections of the cube get built onto subscriptions to the values contained within.

The scary thing is, it eventually becomes hard to think about – you have your multidimensional cube, but the values extend from other sources.  Forget even trying to draw out the data structures; I’d be more worried about my machine imploding…

Oh, and it also brings with it pure, unbridled potential for un-debuggable infinite recursion the likes of which the world has never seen.  On the whole, quite successful I feel.

Castles in the Air

March 24, 2010

One of the things I like about my field is that you do get to build these castles; constructs of raw information, positioned in a place which isn’t quite real but can be interacted with fully using the right tools.  Of course, the ephemereal nature of these constructions does mean the making of them gets a tad tricky at times…

So I’ve been working on a Project.  Not for a client per se, more the personal interest of a senior colleague (more details from said colleague’s blog here).  Of course, he calls it RTC, I call it Mimir, but it’s all the same when it comes down to shovelling data.  Anyway, despite the relatively-informal nature of the Project, it’s definately an interesting change of pace; how do you take a structure like SQL Server Analaysis Services, or Oracle OLAP, and whisk away the foundations, leaving just a complex data structure hanging in the air – basically creating an OLAP cube, without the support of an underlying database.

This might sound rather boring, and if anyone’s stuck with me this far I have to admit I’m impressed, but it is actually quite a tricky problem.   It’s one thing to have a single tree of data, where you just navigate through the structure to get the results you want, but quite another to retrieve data at the intersections of several such trees (in OLAP terms, the measure at the intersection of several hierarchies).  Anyway, I’m not going to go into any more depth as far as the nitty-gritty of the coding is concerned, but here are a few interesting things I picked up;

  • .Net’s Reactive Extensions are pretty funky to use. And looks like the kind of thing you should have thought of already.
  • But rather more complicated when you actually come to use them.
  • Computers obey the letter of the code, not the spirit.  This sometimes means that you get results you never quite expect.
  • This goes doubly when ping-ponging values around with Rx.
  • It is possible to make a very fast, very efficient in-memory data structure to replicate an OLAP cube, but sometimes you just have to bow to the inevitability of yet another function call that you’re sure will screw the whole thing up time-wise.
  • I wish I was using Scala.  (This usually goes without saying).
  • The sense of relief once the unit tests run, and the code is cut is phenomenal.
  • But your co-workers still look at you funny when you punch the air.
  • Newton was right.
  • Which is why, perhaps using Scala would have been a bit more painful.

Of Insanity and Programming Languages

January 11, 2010
tags:

From The C Programming Language;

And yet I saw them in a limitless stream– flopping, hopping, croaking, bleating– sorting themselves inhumanly through the spectral moonlight in a grotesque, malignant saraband of fantastic nightmare. Their croaking, baying voices called out in the hideous language of the Old Ones:

      void Rlyeh
      (int mene[], int wgah, int nagl) {
      int Ia, fhtagn;
      if (wgah>=nagl) return;
      swap (mene,wgah,(wgah+nagl)/2);
      fhtagn = wgah;
      for (Ia=wgah+1; Ia<=nagl; Ia++)
      if (mene[Ia]<mene[wgah])
      swap (mene,++fhtagn,Ia);
      swap (mene,wgah,fhtagn);
      Rlyeh (mene,wgah,fhtagn-1);
      Rlyeh (mene,fhtagn+1,nagl);

      } // PH'NGLUI MGLW'NAFH CTHULHU!

Recursion may provide no salvation of storage, nor of human souls; somewhere, a stack of the values being processed must be maintained. But recursive code is more compact, perhaps more easily understood– and more evil and hideous than the darkest nightmares the human brain can endure.

The scary thing is that I’m not sure this is far off, even with the lovecraftian additions…

Patterns

January 5, 2010
tags:

I have a personal project I’m working on in my free time – a sort of customisable gaming platform of sorts, probably nothing exceptionally innotive at the moment, but a fairly fun hobby (to me at least).  Currently Exoverse (working title) is very much at the Hello World stage, but the current, limited, implementation is 100% Scala, supported by a couple of Java libraries.  It was whilst putting together a simple rules engine to handle input and entity movement that I came across yet another element that I wonder how I did without; Scala’s partial functions.

Quite often in the past I’ve had to work around  Java’s limitations for passing objects through a rules-type engine; switch statements are hard-coded and inflexible (if fast, I feel), and any other solution is either equally inflexible (lengthy if… else if… else blocks) or feels somewhat clumsy and inelegant.  So I was rather pleased when I discovered the application of Scala’s Partial Functions to my problem – wanting an open-ended ruleset to process the message objects I pass into a given Behaviour.  In essence a Partial Function takes a single input parameter and returns a single return value, but the key factor is that a Partial Function is not defined for the full input set.  Effectively, Partial Functions (at least at my current level of use) are simply case statements, with methods to determine whether a given input is valid for that function, and to string together a number of partial functions into what is effectively a dynamically generated switch statement.  below is an example of my Handler code – which takes inputs of type T, and returns a result of type R based on the rules passed to the inheriting class.  For those unfamiliar with Scala, traits are essentially interface-style mixins which can contain concrete implementation.

trait Handler[T,R]    {
     var be: List[PartialFunction[T,R]] = Nil

     def add (b: PartialFunction[T,R]) = be = b :: be
     def remove (b: PartialFunction[T,R]) = be = be - b

     def handle(t: T): Option[R] =    {
         var y: PartialFunction[T,R] = be.head
         be.tail foreach    {z => y = y orElse z}
         if(y isDefinedAt t)    Some(y(t)) else None
     }
}

The meat of this simple rules engine is in the handle function, which accepts the type-T input, and returns an Option – a value that can either encapsulate and object of type R or the value None – intended to negate the possibility of a NullPointerException.  The case statement is assembled by stringing together a list of Partial Functions with the orElse function.  Finally, the assembled function is checked to see if the input is covered (isDefinedAt) before returning the appropriate response.