Tuesday, 26 May 2015

Consuming Free APIs

My day job has recently thrown up a nice new challenge, producing a public API for the company's core system to enable simpler access to the ecosystem for 3rd party developers.  Historically 3rd party developers have been able to interface with and customize the system by consuming an SDK, but that was in the days of an on premise installation of the system.  We are pushing towards a multi-tennented, single instance, cloud based offering, and this cannot be extended or customized by the use of an SDK.  How could the multiple customers using the system each have their own extensions and customizations, purchased from 3rd parties, when there is only a single instance of the service centrally hosted on the cloud?

As a result the decision was made to replace the SDK with a RESTful API which allows access to the data of the individual customer and thus multiple extensions could all coexist, simply accessing the system via the new API.

This new track of development got me thinking.  I had produced APIs in the past, non-web based APIs (using such legacy technologies as COM+ among others), and web based APIs, one using a SignalR interface performing a large amount of business logic and data persistence.  However, none of these APIs had been produced entirely as a public API for consumption by 3rd parties.  All had a target consuming application, be it a cooperating 3rd party (with the potential to resale to other customers) or an internal partner dev team in the case of the SignalR interfaced system.  These systems were developed as APIs as a way of separating concerns,  allowing the multiple teams to develop in isolation, concentrating on their own area of expertise, with an agreed interface.  The key advantage of an API.

So having not really thought in depth about the creation of an API for general public use, equally I had not thought about this from the other side, consuming a publicly accessible API produced by someone else who I have no professional relationship with.



I have for some time been toying with the idea of producing a mobile app, but have not yet come up with an idea of something I want to create for the public to see.  I am of the opinion that creating something rather than nothing is a better idea, be it imperfect or not, I should simply create something and release it, and that is what I will do.  This is a way of thinking that is a little new to me, previously I was of the opinion that if it was not a 'great' idea there was no point in trying it.  But John Sonmez's blog post on the subject of simply creating something regularly is better than sitting on an idea until a perfect idea comes along.

So where do I start? Years ago I thought that indie games would be the best approach, I still think this would be a good thing to do, potentially very profitable, but an approach that could take a lot of effort to get something out.  And more importantly something I have little experience in, particularly the design and aesthitics aspect.  If I want to create something fairly rapidly in my spare time (which is scarce) I need to find a short-cut.  This is where the use of APIs come in.

There are thousands of publicly accessible, freely accessible, APIS out there.  Web sites like https://www.mashape.com http://www.programmableweb.com/ and http://freeapi.net/ offer a range of APIs, and are just the tip of the iceberg for what is available.  Creating a mobile application that consumes one of these APIs should be fairly straightforward, and as I have no great ideas of my own, my plan is to look through the lists of readily available APIs, and try to come up with an idea of how to present its functionality in a new way within a mobile app.

This approach has the massive advantage that someone else has done all the hard work of creating the API, hopefully thought through all the logic and created something that will do what is claims to do.  So if I can present a good human interface to use such an API, then I will have a useful app, and something that the public may want to consume.

Monday, 18 May 2015

The Wrong Path?

I read an email post today regarding the benefits of doing something rather than nothing, even if it might be the wrong thing to do.  This post by John Sonmez uses the quote
Sometimes you can only discover the right path by taking the wrong path 50 times.
This struck a real chord with me on a number of levels.  I have a current desire to enter into the world of mobile apps, the problem here being that I need and idea for an app to create.  I have written at some length here about the use of genetic algorithms to solve difficult mathematical problems, an approach based entirely in exploring the 'wrong path'.  I started a SaaS project with some friends a while back that has stalled somewhat due to a number of reasons, not least a reluctance to expose the beta version to wider public scrutiny. And lastly, in my day job I am working on a public API to the companies core system.  This is a new venture, and we are finding ourselves galloping down the wrong path on a daily basis, and returning to try the left fork rather than the right very quickly.  It has been a very refreshing approach from a software house that is steeped in protracted design and analysis phases before development.  An approach that is producing some considerable discomfort in a majority of the dev team (testers included), but one that I am relishing.



A big part of the problem is that we are attempting to created an automated testing suite as we develop, a great idea in itself, however the testers who are defining the content of this test suite are struggling a little with the idea of the expectations of the system changing faster than the tests are being produced.

For example, when requesting the list of entities related to another entity using an http get request with the url /entityA/{id}/entityB to get all the instances of entityB which has an entityA with id = {id}, if there is no entityA with id ={id} the testers argue that the system should return an error, most likely a 404 not found error.  This assumption was backed up with the rationale that if the request /entityA/{id} was used then a 404 error would be returned.  In contrast if entityA with id = {id} does exist, but has no entityB children, then an empty list would be valid.

A compromise was made that for the example we looked at, it was not valid to have an entityA with no entityB children, so if an empty list was retrieved from the database, then a 404 would be returned.  The developers were happy, as we could simply return a 404 for 'routed' get all requests resulting in a zero length list, successful blanks lists for 'direct' get all requests (/entityB/) and 404 for a direct get single (/entityA/{id}) with no match.

This meant that no validation of {id} was required, so a single database query will give all the information we need.

A couple of days later this became more complex.  An example was found with entityC and entityD, where entityC could exist with no children of type entityD, so the request /entityC/{id}/entityD could produce no results for two situations, entityC with id = {id} may not exist, or it may exist with no children.  The code as it stood returned a successful result with a blank list in both circumstances.  The testers did not like this as we had made the decision to go this way with analysis of only the A,B case, without consideration of the C,D case.

The final decision of what we will do is yet to be made, however we have a system that supports the query of both A,B and C,D working right now.  If we had started analyzing all the possible combinations of entity types in the system (100s) before we decided on a way to go for any routed queries we would have nothing for a good month.

We may have taken the wrong path, the testers would say that we have, we may have chosen the right one, the architects are arguing that side based on reducing database hits to a minimum (checking if entityA with id = {id} would require another db call after we know that the full query returns nothing, and for more complex queries with multiple routings this may amount to a significant increase in DB traffic and therefore cost as its a cloud based DB).

Whether this will get us to Eldorado in the end only time will tell.  We are due to engage early with a consumer of the API, and their preference on what we return will be the deciding factor, but at least we are in a position very early to show them something.

Wednesday, 22 April 2015

Never Lose an Idea

It may be a bit off topic for this blog, but its something that has plagued me so I thought I would document some things I have done to help myself.  I, like many a dev, am forever looking for the next great idea that I cant develop to potentially create a secondary source of income, and the dream is this secondary source become great enough to stop the primary source of working for someone else.

The problem is getting the idea.  Well we all have ideas, the efficient programmer blog talks about the fact we can all come up with ideas, but generally it does not happen when we sit down and try to have an idea.  This was the case for me last week, I had an idea for an SaaS system whilst in the car on the commute home.  Fortunately I was able to explore the idea, and it turned out not to be a particularly viable idea as someone already has a system offering the same service very well at a cheap price.  But what if I had not been in a position to explore the idea straight away?  What if I had been too busy to use the idea right away?  What if I had been on a roll and came up with too many ideas to deal with at once.  How could I keep track of them all?  How could I organise them for a later date?  Well the efficient programmer blog offered a good solution to this, and I will look to document my progress through implementing this for myself.

The system they proposed is the use of Trello as a repository for the ideas, and the use of and orchestration system to forward emails into the Trello board.  I am looking to implement a simialr system, using only Trello with no orchestration software to set up.

I created 3 boards in Trello, one to contain ideas for content matter (blogs, videos etc) one for mobile app ideas, and one for SaaS system ideas.  On each of these boards I added 5 lists: Ideas; To Develop; In Progress; Done; and Rejected.  The plan being that new ideas go into the first.  Once I have done a little background investigation and decided its a feasible idea it will move to the second.  When work begins for real, into the third, and when something goes live the card goes to Done.  Anything noted as  not feasible goes into the Rejected list.



This gives me the ability to organise and file the ideas, categorising between content/app/saas and progressing the ideas.  It also give the ability to track rejected ideas so I dont waste time coming up with the same idea every 2 months.  The advantage the efficient programmer approach was the ability to email in ideas, so as long as you have access to a device with email capability you can record ideas.  I also added this ability, but my approach does not require any other software, it uses inbuilt Trello functionality.

To hook up the ability to email a card into Trello, go to the Menu and select 'Email to Board Settings'.  This will bring up a panel with an email address, and the options of which list to add the cards generated to.  You can also choose to add cards to the bottom or the top of the list.  I selected 'Ideas' list and 'Bottom', so the oldest ideas will always be at the top for sanitising.


The approach used by efficient programmer, on the face of it, allows you to have your different categories (content/app/SaaS) on a single Trello board as multiple lists, where as my approach requires one board per category with the multiple lists used as a progress status indicator.

Using the email address for a test produces a card:
and the card contains the subject of the email as its title and the body of the email as the card description:

The one problem I have found is the inability to configure the email address associated with the board, but other than that, this approach gives a solution to the problem of recording and organising ideas that are generated at inopportune moments with little set up.  I set up this entire system in under  20 minutes including taking screen shots along the way.

Finally, if you found this blog useful and interesting please follow me and share this with anyone and everyone.

Monday, 20 April 2015

How Automated Integration Testing Can Breakdown


Automated integration testing to me means being able to run tests against a system that test the way some or all of the parts integrate to make the whole, fulfilling the requirements of the system, in a way that can be repeated without manual intervention.  This may be as part of a continuous integration type build process, or simply a human initiating the test run.
In order to perform this task, the tests must be able to be run in any order, as any subset of the entire suite, and in parallel with no implications on the result of any test.

For the purposes of automated integration testing, I am an advocate of a BDD style approach using a test definition syntax such as that of cucumber, and I will explore a case where I have worked on this approach to integration testing only to be bitten by some inadequacies of the way it was implemented.  I will begin by saying that the choice of using cucumber style tests for this project was not necessarily wrong, but greater pre-planning of the approach was needed due to some specific technical issues.

System under Test

To give some background around the system that we were building, it consisted of a small tool that ran a series of stored procedures on a database, these output data in xml format.  This data was then transformed using a xslt, and the result was uploaded to an API endpoint.  The scope of the testing was the stored procedures and xslts, so ensuring that the data extracted and transformed conformed to the schema required by the API and that it was comprised of the data expected from the database being used.
The database itself was the back end of a large, mature system, the structure and population of which was the responsibility of another team.  Additionally the system that populates this database has a very thick and rich business layer, containing some very complex rules around the coupling within the data model which are not represented by the simple data integrity rules of the data model itself.  The data model is held only as a series of sql scripts (original creation plus a plethora of update scripts) and as such proves difficult to integrate into a new system.
During the day to day use of the existing, mature system the data is changed in various ways, and it is the job of the new system to periodically grab a subset of this data and upload it to the API using an incremental change model.  So the tests are required to validate that the stored procedures can run for a first time to get all data, and after some modification to get the incremental changes.

How Integration Tests were Set Up

The tests themselves followed the format:

 Given a blank database  
 And data in table "mytable"  
 |field1|field2|field3|  
 |value1|value2|value3|  
 When the data is obtained for upload  
 the output will be valid  
With the variation
 Given a blank database  
 And data in table "mytable"  
 |field1|field2|field3|  
 |value1|value2|value3|  
 And the data is obtained for upload  
 And data in table "mytable" is updated to   
 |field1|field2|field3|  
 |value1a|value2a|value3a|  
 the output will be valid  

for an incremental change.
This required that an unique blank database be created for that test (and every test in practice), the data be added to the database, the process of running the stored procedures and transforms be performed and the resulting data for upload be used in the validation step.  Creation of a blank database is simple enough, and can be done either by the scripts used by the main system, or as we chose, by creating an Entity Framework code-first model to represent the data model.  The first and most obvious problem you will have guessed is when the core data model changes, the EF model will need to be updated.  This problem was swept under the carpet as the convention is to not delete or modify anything in the data model, only to extend, but still it is a flaw in the approach taken for the testing.

The second and in my opinion most problematic area of this approach comes from data integrity.  The EF model contains the data integrity rules of the core data model and enforces them, however if for example in the test above 'mytable' contained a foreign key constrain from 'field1' to some other table ('mytable2'), this other table would also need populating with some data to maintain data integrity.  The data in 'mytable2' is not of interest to the test as it is not extracted by the stored procedures, so any data could be inserted so long as the constraints of the data model are met.  To this end a set of code was written to auto-populate any data required for integrity of the model.  This involved writing some backbone code to cover the general situation, and one class for each table
(for example
  class AlternativeItemHelper : EntityHelper<AlternativeItem>  
   {  
     public AlternativeItemHelper(S200DBBuilder dbbuilder)  
       : base(dbbuilder)  
     {  
       PrimaryKeyField = "AlternativeItemID";  
       ForeignKeys.Add(new ForeignKeySupport.ForeignKeyInfo<AlternativeItem, StockItem> { LocalField = "ItemID", ForeignField = "ItemID", Builder = dbbuilder });  
       ForeignKeys.Add(new ForeignKeySupport.ForeignKeyInfo<AlternativeItem, StockItem> { LocalField = "ItemAlternativeID", ForeignField = "ItemID", Builder = dbbuilder });  
     }  
   }  
will create data in table 'StockItem' to satisfy constraints on 'ItemID' and 'ItemAlternativeID' fields of the 'AlternativeItem' table.)
that was to be populated as part of a test scenario.  As you can imagine, if 'mytable2' contains a similar foreign key relation, then 3 tables need to be populated, and with the real data model, this number grows very large for some tables due to multiple constraints on each table.  In one test scenario the addition of one row of data to one table resulted in over 40 tables being populated.  This problem was not seen early, as the first half dozen tables populated did not have any such constraints, so the out of the box EF model did not need any additional data to be satisfied.
One advantage of the use of an EF model that I should highlight at this stage is the ability to set default values for all fields, this means that non-nullable fields can be given a value without it being defined in the test scenario (or when auto populating the tables for data integrity reasons alone).
If the data in the linked tables was of interest to the test scenario, the the tables could be populated in the pattern used in the example scenario, and so long as the ordering of the data definition was correct the integrity would be maintained with the defined data.

The third problem was the execution time for the tests.  Even the simplest of tests had a minimum execution time of over one minute.  This is dominated by the database creation step.  In itself this is not a show stopper if tests are to be run during quiet time, e.g. overnight, but if developers and testers want to run tests in real time, and multiple tests are of interest, this meant a significant wait for results.

Summary

The biggest problem with the approach taken was the time required to write a seemingly simple test.  The addition of data to a single table may require developer time to add in the data integrity rules, a large amount of tester time to defined what data needs to be in each table required by the integrity rules of the model and if the default values are sufficient.  Potentially dev time to define the default values for the additional tables.  In the end a suite of around 200 test was created which takes over 3 hours to run, but due to a lack of testing resource, full coverage was never achieved and manual testing was decided as the preferred approach by management.

Supporting Code Examples

The example entity helper for auto populating additional tables is derived from a generic class for help with all entity types, this takes to form
  public abstract class EntityHelper<T> : IEntityHelper<T>  
   where T : class, new()  
   {  
     protected S200DBBuilder dbbuilder;  
     protected DbSet<T> entityset;  
     protected long _id = 0;  
     protected string PrimaryKeyField { get; set; }  
     protected Lazy<GetterAndSetter> PkFieldProp;  
     public Lazy<List<PropertySetter>> PropertySetters { get; protected set; }  
     public EntityHelper(S200DBBuilder dbbuilder):this()  
     {  
       Initialize(dbbuilder);  
     }  
     protected EntityHelper()   
     {  
     }  
     public object GetRandomEntity()  
     {  
       return GetRandomEntityInternal();  
     }  
     protected T GetRandomEntityInternal()  
     {  
       T entity = new T();  
       //need to set all the properties to random values - and cache a way to create them faster  
       PropertySetters.Value.ForEach(ps => ps.SetRandomValue(entity));  
       return entity;  
     }  
     public virtual void Initialize(S200DBBuilder dbbuilder)  
     {  
       this.dbbuilder = dbbuilder;  
       this.entityset = dbbuilder.s200.Set<T>();  
       ForeignKeys = new List<IForeignKeyInfo<T>>();  
       PkFieldProp = new Lazy<GetterAndSetter>(() =>  
       {  
         var type = typeof(T);  
         var prop = type.GetProperty(PrimaryKeyField);  
         return new GetterAndSetter { Setter = prop.GetSetMethod(true), Getter = prop.GetGetMethod(true) };  
       });  
       //initialise the PropertySetters  
       PropertySetters = new Lazy<List<PropertySetter>>(() =>  
       {  
         var list = new List<PropertySetter>();  
         list.AddRange(typeof(T)  
                     .GetProperties()  
                     .Where(p => !p.Name.Equals("OpLock", StringComparison.OrdinalIgnoreCase))  
                     .Where(p => !(p.GetGetMethod().IsVirtual))  
                     .Select(p => PropertySetterFactory.Get(dbbuilder.s200, p, typeof(T)))  
                     );  
         return list;  
       });  
     }  
     protected virtual T AddForeignKeys(T ent)  
     {  
       UpdatePKIfDuplicate(ent);  
       ForeignKeys.ForEach(fk => CheckAndAddFK(fk, ent));  
       return ent;  
     }  
     protected void UpdatePKIfDuplicate(T ent)  
     {  
       //assumes all keys are longs  
       var pk = (long)PkFieldProp.Value.Getter.Invoke(ent, new object[] { });  
       var allData = entityset.AsEnumerable().Concat(entityset.Local);  
       var X = allData.Count();  
       while (allData.Where(e => PkFieldProp.Value.Getter.Invoke(e, new object[] { }).Equals(pk)).Count() >0)  
       {  
         pk++;  
         PkFieldProp.Value.Setter.Invoke(ent, new object[] {pk });  
       }  
     }  
     protected T ReplicateForeignKeys(T newent, T oldent)  
     {  
       ForeignKeys.ForEach(fk => fk.CopyFromOldEntToNew(oldent, newent));  
       return newent;  
     }  
     public void AddData(IEnumerable<T> enumerable)  
     {  
       entityset.AddRange(enumerable.Select(ent => AddForeignKeys(ent)));  
     }  
     public void UpdateData(IEnumerable<T> enumerable)  
     {  
       foreach (var newent in enumerable)  
       {  
         var oldent = GetCorrespondingEntityFromStore(newent);  
         UpdateEntityWithNewData(oldent, newent);  
         dbbuilder.s200.Entry(oldent).State = EntityState.Modified;          
       }  
     }  
     protected void UpdateEntityWithNewData(T oldent, T newent)  
     {  
       foreach (var prop in typeof(T).GetProperties())  
       {  
         //todo - change this line to be a generic check on the prop being a primary key field  
         if (prop.Name.Equals("SYSCompanyID")) continue;  
         var newval = prop.GetGetMethod().Invoke(newent, new object[] { });  
         // Not sure if this is the correct place to do this, will check with Mike W  
         if (newval != null)  
         {  
           var shouldUpdateChecker = UpdateCheckers.Get(prop.PropertyType);  
           shouldUpdateChecker.Update(newval, oldent, prop.GetSetMethod());  
         }  
       }  
     }  
     public void Delete(T entity)  
     {  
       var storeentity = GetCorrespondingEntityFromStore(entity);  
       DeleteEntity(entity);  
     }  
     private void DeleteEntity(T entity)  
     {  
       entityset.Remove(entity);  
     }  
     public void Delete(long id)  
     {  
       var entity = GetById(id);  
       DeleteEntity(entity);  
     }  
     public void DeleteAll()  
     {  
       var all = entityset.ToList();  
       entityset.RemoveRange(all);  
     }  
     public long AddSingle(T entity)  
     {        
       var id = Interlocked.Increment(ref _id);  
       SetId(entity, id);  
       AddData(new[] { entity });  
       return id;  
     }  
     protected void SetId(T entity, object id) { PkFieldProp.Value.Setter.Invoke(entity, new[] { id }); }  
     protected T GetCorrespondingEntityFromStore(T newent) { return GetById(PkFieldProp.Value.Getter.Invoke(newent, new object[] { })); }  
     protected T GetById(object id) { return entityset.AsEnumerable().Single(ent => PkFieldProp.Value.Getter.Invoke(ent, new object[] { }).Equals(id)); }  
     public void UpdateAllEntities(Action<T> act)  
     {  
       entityset.ToList().ForEach(act);  
     }  
     public void UpdateEntity(int id, Action<T> act)  
     {  
       var entity = GetById(id);  
       act(entity);  
     }  
     public IEnumerable GetDataFromTable(Table table)  
     {  
       return table.CreateSet<T>();  
     }  
     public void AddData(IEnumerable enumerable)  
     {  
       var data = enumerable.Cast<T>();  
       AddData(data);  
     }  
     public void UpdateData(IEnumerable enumerable)  
     {  
       var data = enumerable.Cast<T>();  
       UpdateData(data);  
     }  
     protected List<IForeignKeyInfo<T>> ForeignKeys { get; set; }  
     protected void CheckAndAddFK(IForeignKeyInfo<T> fk, T ent)  
     {  
       //first get the value on the entitity and check if it exists in the model already  
       fk.CreateDefaultFKEntityAndSetRelation(ent);  
     }  
     public void CreateDefaultEntity(out long fkID)  
     {  
       var entity = new T();  
       fkID = AddSingle(entity);  
     }  
     public void CreateDefaultEntityWithID(long fkID)  
     {  
       var entity = new T();  
       SetId(entity, fkID);  
       AddData(new[] { entity });  
       //the _id field needs to be greater than the id used here, so   
       fkID++;  
       if (fkID >= _id)  
         Interlocked.Exchange(ref _id, fkID);  
     }  

To create default values for any entity we created a partial class to extend the entity class that has one method:
  public partial class BinItem  
   {  
     partial void OnCreating()  
     {  
       DateTimeCreated = DateTime.Now;  
       BinName = string.Empty;  
       SpareText1 = string.Empty;  
       SpareText2 = string.Empty;  
       SpareText3 = string.Empty;  
     }  
   }  
this method being called as part of the constructor of the entity class
  [Table("BinItem")]  
   public partial class BinItem  
   {  
     public BinItem()  
     {  
          ...
          OnCreating();  
     }  
     partial void OnCreating();  
     ...  
   }  
The entity helpers rely upon foreign key information to know about the constraints on the table, these are supported by a class
  public class ForeignKeyInfo<T, T2> : IForeignKeyInfo<T>  
     where T : class,new()  
     where T2 : class,new()  
   {  
     public ForeignKeyInfo()  
     {  
       BuildIfNotExists = true;  
       LocalFieldSetter = new Lazy<MethodInfo>(() =>  
         {  
           var type = typeof(T);  
           var prop = type.GetProperty(LocalField);  
           if (prop == null)  
             prop = type.GetProperty(LocalField+"s");  
           return prop.GetSetMethod(true);  
         });        
       LocalFieldGetter = new Lazy<MethodInfo>(() =>  
       {  
         var type = typeof(T);  
         var prop = type.GetProperty(LocalField);  
         if (prop == null)    
           prop = type.GetProperty(LocalField + "s");  
         return prop.GetGetMethod(true);  
       });  
       ForeignFieldGetter = new Lazy<MethodInfo>(() =>  
       {  
         var type = typeof(T2);  
         var prop = type.GetProperty(ForeignField);  
         if (prop == null)  
           prop = type.GetProperty(ForeignField + "s");  
         return prop.GetGetMethod(true);  
       });  
       ForeignTableGetter = new Lazy<MethodInfo>(()=>   
         {  
           var type = typeof(S200DataContext);  
           var prop = type.GetProperty(typeof(T2).Name);  
           if (prop == null)  
           {  
             prop = type.GetProperty(typeof(T2).Name+"s");  
             if (prop == null && typeof(T2).Name.EndsWith("y"))  
             {    
               var currentName = typeof(T2).Name;;  
               prop = type.GetProperty(currentName.Substring(0,currentName.Length-1) + "ies");  
             }  
             if (prop == null && typeof(T2).Name.EndsWith("eau"))  
             {  
               prop = type.GetProperty(typeof(T2).Name + "x");  
             }  
             if (prop == null && typeof(T2).Name.EndsWith("s"))  
             {  
               prop = type.GetProperty(typeof(T2).Name + "es");  
             }  
           }  
           var getter = prop.GetGetMethod(true);  
           return getter;  
         });  
     }  
     public string LocalField { get; set; }  
     public S200DBBuilder Builder { get; set; }  
     public string ForeignField { get; set; }  
     public bool DoesFKExist(T ent)  
     {  
       //check the foeign table to see is an entry exists which matches the ent  
       var lf = LocalFieldGetter.Value.Invoke(ent, new object[] { });        
       return GetForeignEnts(lf).Count()> 0;  
     }  
     public void CreateDefaultFKEntityAndSetRelation(T ent)  
     {  
       if (DoesFKExist(ent))  
       {  
         return;  
       }  
       var lf = LocalFieldGetter.Value.Invoke(ent, new object[] { });  
       if (lf == null)  
       {  
         if (BuildIfNotExists)  
         {  
           //the test did not define the FK ID to use, so just default it to the next in the sequence  
           long fkID = 0;  
           Builder.WithDefaultEntity(typeof(T2), out fkID);  
           //now set the FK relation  
           LocalFieldSetter.Value.Invoke(ent, new object[] { fkID });  
         }  
       }  
       else  
       {  
         //create the FK entity using the id that has been passed in  
         Builder.WithDefaultEntityWithID(typeof(T2), (long)lf);  
       }  
     }  
     private T2 GetForieignEnt(object fkID)  
     {  
       return GetForeignEnts(fkID).FirstOrDefault();  
     }  
     private IEnumerable<T2> GetForeignEnts(object fkID)  
     {  
       var castData = (DbSet<T2>)(ForeignTableGetter.Value.Invoke(Builder.s200, new object[] { }));  
       var allData = castData.AsEnumerable().Concat(castData.Local);  
       var fes = allData.Where(fe => ForeignFieldGetter.Value.Invoke(fe, new object[] { }).Equals(fkID));  
       return fes;  
     }  
     private Lazy<MethodInfo> LocalFieldSetter;  
     private Lazy<MethodInfo> LocalFieldGetter;  
     private Lazy<MethodInfo> ForeignFieldGetter;  
     private Lazy<MethodInfo> ForeignTableGetter;  
     public T CopyFromOldEntToNew(T oldent, T newent)  
     {  
       if (DoesFKExist(newent))  
       {  
         return newent;  
       }  
       var value = LocalFieldGetter.Value.Invoke(oldent, new object[] { });  
       LocalFieldSetter.Value.Invoke(newent, new object[] { value });  
       return newent;  
     }  
     public bool BuildIfNotExists { get; set; }  
   }  

Monday, 13 April 2015

BDD to Break Communication Barriers

In my years of software development the one underlying theme that has caused the most problems when it comes to delivering quality software that fits the needs of the customer is misunderstanding and miscommunication of requirements.  This is not a problem that is due to any individual doing their job badly, not down to poor communication skills, not down to a lack of effort of diligence by anyone.  It is simply due to the number of times the same information is written, spoken, heard, interpreted and misinterpreted.  Its a tale of Chinese whispers, a term that is probably not politically correct in this day and age, but one that fits the bill.  In an idealised scenario:

  1. Customer dreams up something they want a system to do
  2. Business analyst and sales team talk to them to nail down chargeable and isolated items (stories if you like)
  3. Business analysts document the stories for the technical team(s)
  4. Architects and developers read these, interpret and develop a solution
  5. Testers read and interpret the BA's docs and create a test suite
In this scenario the requirements are 'written' 2 times by 2 people, 'read' 3 times by 3 people but reinterpreted at each step, so what the 3 layers of this process (customer; BA and sales; dev and test) understand the system needing to do can be vastly different, especially in the detail.  A situation can occur where the BA slightly misunderstands the customers needs, then does not communicate their understanding thoroughly.  The dev and test teams pick this up and interpret in 2 further subtly different ways.  All three layers of the process think they understand everything fully, the system 'works' in so much as it performs a function, all of the tests are passed so the system is delivered or demoed to the customer, but the system does not do what the customer really wanted it to do.  Where did the failure happen?  Everyone in the process has performed their own individual part successfully, and something has been built that everyone will be to some extent satisfied with, but its not what the customer wants.

What is missing here is a single item that all players can refer to and agree upon as fully defining the requirements.  Conventional wisdom would say that the document produced by the BA is that item, and ideally that would work, one document, all parties agree to its contents, all code is built to its specifications, all tests validate these specifications and all the desires of the customer are met.  In practise the 3 ways this document is interpreted mean the one single document conveys three messages.  So how can we overcome this?

A document that can be 'read' by a machine, interpreted only in one way so code to make the system work and tests to validate this have the same interpretation, and so long as this can be read easily in a non-technical manner by the customer and agreed upon the loop should be closed.  Customer agrees to the specification when written into the document, the tests prove this happens and works, and the code is written to satisfy the tests.  No ambiguity remains.

This forms something of the basis for the concept of Behaviour Driven Development (BDD) where the desired behaviour of the system is defined and forms the specifications, the tests and drives the development to meet these, akin to test driven development but where overall behaviour of the system is the driver, not the technical test specifications, which in general do not over the overall system, but isolated units of the system.  The core advantage of BDD is that the behaviour specification is written in a language that can both be read by non-technical personnel (customers) and interpreted by the computer without translation.

The syntax of defining specifications is a well established one, and one that has many flavors (GWT, AAA etc).  A DSL was developed to encapsulate the given when then formulation of requirements and has been given the name gherkin.  For the purposes of using this within a .net project a tool called SpecFlow is the one I currently choose to use, although in the past I have used cucumber and had to write Ruby code to access the actual code of the .Net system, the advantage of Specflow is that all the specifications, code to access the system and the system under test itself can exist in one place, and written in one development language.

I am not writing a post on how to perform BDD here, I am looking to highlight the advantages of a BDD tool like Specflow to the development process, and specifically the communication of detailed technical ideas between different groups and disciplines within the process without ambiguity of interpretation creeping in.  That said, a simple test specification taken from the cucumber website provides a good place to start in terms of understanding how this fits into the dev process.

 Feature: CalculatorAddition  
      In order to avoid silly mistakes  
      As a math idiot  
      I want to be told the sum of two numbers  
 Scenario: Add two numbers  
      Given I have entered 50 into the calculator  
      And I have entered 70 into the calculator  
      When I press add  
      Then the result should be 120 on the screen  

This specification is the example one that is provided when you add a new spec to a unit test project in visual studio using the Specflow add in, but provides a good point to explore the way gherkin solves the problem of interpretation of requirements.

This specification is very easy to understand, a non-technical person e.g. the customer, could read this and sign-off that this details what they need the system to be able to do.  The question is how does this satisfy the needs of the testers to validate that the system does what is described.  Well that is the beauty of the cucumber/specflow system.  This series of definitions constitute a test as well as a requirement specification. The specflow framework executes a piece of code for each of these lines, the code in question being hooked into by a regular expression match of the definition itself against an attribute on the code method.  And the convention is that the 'Then' definition will validate the outcome of the action against the expectations (do an assert if you prefer).  The code that needs to be written to hook this into the production system is very straightforward
 [Binding]  
   public class CalculatorAdditionSteps  
   {  
     Calculator calc = new Calculator();  
     [Given(@"I have entered (.*) into the calculator")]  
     public void GivenIHaveEnteredIntoTheCalculator(int value)  
     {  
       calc.InputValue(value);  
     }  
     [When(@"I press add")]  
     public void WhenIPressAdd()  
     {  
       calc.DoAddition();  
     }  
     [Then(@"the result should be (.*) on the screen")]  
     public void ThenTheResultShouldBeOnTheScreen(int expectedResult)  
     {  
       Assert.AreEqual(expectedResult, calc.Result);  
     }  
   }  
and this form the basis of a simple unit test.  As you can imagine detailing a fully functional production system will involve significantly more code to be written, but with the advantage that if the specifications drive the process, the tests come for free, and the architecture and code design is driven towards a clear and easily instantiated structure.  Minimal coupling and dependencies make the production and maintenance of the 'hook' code here significantly easier.

When performing this as a BDD process a simple Calculator class will satisfy the needs of the test as far as being able to build
 class Calculator  
   {  
     internal void InputValue(int value)  
     {  
       throw new NotImplementedException();  
     }  
     internal void DoAddition()  
     {  
       throw new NotImplementedException();  
     }  
     public object Result { get; set; }  
   }  
And when run the test will fail.  It is also possible to work with only the specification, before the 'hook' code is written, at which stage running the test will give an inconclusive result, highlighting that the specification has been detailed, but that no work has been performed to hook this into validate the system, potentially meaning the system has not had the new functionality added.

There are shortcomings to this approach, but as a way of removing the element of Chinese whispers from the development process it goes a long way to solving the problem.

I will showcase a situation where this approach proved problematic in a future blog, a situation where it did test the system successfully but where the overhead of creating and maintaining the specifications and the hook code outweighed the advantages provided.

Tuesday, 7 April 2015

Visualisation of evolution

Even though we know that the end result is what we are after, and speed is one of the most important factors, it would be nice when assessing different mutation and breeding operator combinations, and the affect of the applied fitness function to track the evolution of the population, or at least that of the fittest member(s) graphically to quickly see, and convey to intetrsted parties what is happening.
To this end I will explore the possibility of hooking a visualisation interface into the algorithm with a minimum of code churn, and minimal speed impact.
The approach I will take is to use a consumer object to handle the generation complete event. The responsibility of not blocking the processing thread will fall to this consumer, and all details of the rendering of the interim results will be totally hidden to the gentic algorithm itself.  This approach will mean that if a web enabled front end, or simply a different desktop UI, were needed you merely need to construct this and inject it.

I have chosen to use a WPF Gui as I have experience of automatically updating graphs in this medium. Another technology may be better suited to your skill set. The WPFToolkit offers a very good charting control, which can plot data the is updated in real time very easily with data binding.  I will not go into the details of the WPF application itself, or the structure of such an application, however the details of displaying the evolution are what we are interested in, so that is what I will focus on. But I will say that my chosen architecture employed an MVVM pattern

 The code for each chart is very simple, with the UI layer being simply

  xmlns:chartingToolkit="clr-namespace:System.Windows.Controls.DataVisualization.Charting;assembly=System.Windows.Controls.DataVisualization.Toolkit"  
 <chartingToolkit:Chart Title="Fitness" >  
       <chartingToolkit:Chart.Axes>  
         <chartingToolkit:LinearAxis Orientation="Y" ShowGridLines="False"  Minimum="{Binding MinFitness}" Maximum="{Binding MaxFitness}" />  
       </chartingToolkit:Chart.Axes>  
       <chartingToolkit:LineSeries DependentValuePath="Value" IndependentValuePath="Key" ItemsSource="{Binding GenerationFitness}" IsSelectionEnabled="True" />  
     </chartingToolkit:Chart>  

Where the MinFitness and MaxFitness are values calculated as results are generated to give a sensible range for the graph, and the GenerationFitness property is a collection holding the points to plot in the graph. This is bound to a view model that exposes the data without exposing the detail of the GA, and this takes the form:

 class ViewModel: NotifyingObject  
   {  
     private Model theData;  
     private double _minFitness;  
     private double _maxFitness;  
     private double varf = 0.01d;  
     private string _results;  
     private int _delay=0;  
     public ViewModel()  
     {  
       theData = new Model();  
       GenerationFitness = new ObservableCollection<KeyValuePair<int, double>>();  
       GenerationF = new ObservableCollection<KeyValuePair<int, double>>();  
       GenerationR = new ObservableCollection<KeyValuePair<int, double>>();  
       theData.NewGeneration += GotNewGeneration;  
       theData.FinalResults += GotFinalResults;  
       ResetFitness();  
     }  
     public int PopulationSize { get { return theData.PopulationSize; } set { theData.PopulationSize = value; } }  
     public int MaxGenerations { get { return theData.MaxGenerations; } set { theData.MaxGenerations= value; } }  
     public double MinFitness { get { return _minFitness; } set { _minFitness = value; OnPropertyChanged(); } }  
     public double MaxFitness { get { return _maxFitness; } set { _maxFitness = value; OnPropertyChanged(); } }  
     public string Results { get { return _results; }set{_results = value; OnPropertyChanged();} }  
     public int Delay { get { return _delay; } set { _delay = value; OnPropertyChanged(); } }  
     public ObservableCollection<KeyValuePair<int, double>> GenerationFitness { get; set; }  
     public ObservableCollection<KeyValuePair<int, double>> GenerationR { get; set; }  
     public ObservableCollection<KeyValuePair<int, double>> GenerationF { get; set; }  
     public ICommand Stop { get { return new RelayUICommand("Stop", (p) => theData.Stop(), (p) => theData.IsRunning); } }  
     public ICommand Start  
     {  
       get  
       {  
         return new RelayUICommand("Start", (p) =>  
         {  
           ClearAll();  
           theData.Start();  
         }  
         , (p) => !theData.IsRunning);  
       }  
     }  
     public ICommand Clear  
     {  
       get  
       {  
         return new RelayUICommand("Clear", (p) =>  
         {  
           ClearAll();  
         }  
           , (p) => !theData.IsRunning);  
       }  
     }  
     private void ResetFitness()  
     {  
       _minFitness = 0d;  
       _maxFitness = 1d;  
     }  
     private void GotNewGeneration(object sender, GenerationEventArgs e)  
     {  
       Application.Current.Dispatcher.Invoke(() =>  
         {   
           GenerationFitness.Add(new KeyValuePair<int, double>(e.Generation, e.Fitness));  
           if (e.Generation ==1)  
           {  
             MaxFitness = e.Fitness * (1d + varf);   
             MinFitness = e.Fitness * (1d-varf);  
           }  
           MaxFitness = Math.Max(MaxFitness, e.Fitness *(1d + varf));  
           MinFitness = Math.Min(MinFitness, e.Fitness * (1d - varf));  
           GenerationF.Add(new KeyValuePair<int, double>(e.Generation, e.F));  
           GenerationR.Add(new KeyValuePair<int, double>(e.Generation, e.R));  
           Debug.WriteLine(String.Format("Generation: {0}, Fitness: {1},R: {2}, F: {3}", e.Generation, e.Fitness, e.R, e.F));  
         });  
       Thread.Sleep(Delay );  
     }  
     private void GotFinalResults(object sender, FinalResultsEventArgs e)  
     {  
       Results = String.Format("R: {0}{1}F: {2}{1}Fitness: {3}{1}{1}From Values:{1}{4}", e.R, Environment.NewLine, e.F, e.Fitness, String.Join(Environment.NewLine, e.GeneValues));  
     }  
     private void ClearAll()  
     {  
       ResetFitness();  
       GenerationFitness.Clear();  
       GenerationR.Clear();  
       GenerationF.Clear();  
       Results = "";  
     }  
   }  

The model behind this does the work of instantiating the GA and it relays the results of each generation and the completion of the run in a meaningful manner to the view model:

 class Model: NotifyingObject  
   {  
     private double targetR = 0.95d;  
     private double targetF = 0.5d;  
     public double TargetR { get { return targetR; } set { targetR = value; OnPropertyChanged(); } }  
     public double TargetF { get { return targetF; } set { targetF = value; OnPropertyChanged(); } }  
     public EventHandler<DoubleEventArgs> NewFitnessValueArrived;  
     public EventHandler<GenerationEventArgs> NewGeneration;  
     public EventHandler<FinalResultsEventArgs> FinalResults;  
     private IGAProvider gaProvider;  
     private GeneticAlgorithm ga;  
     private int _maxGenerations;  
     private bool _isRunning;  
     public Model()  
     {  
       PopulationSize = 10000;  
       MaxGenerations = 100;  
       ////initialise the GA and hook up events  
       const double crossoverProbability = 0.02;  
       const double mutationProbability = 0.8;  
       gaProvider = GetGaProvider();  
       var crossover = new Crossover(crossoverProbability, true)  
       {  
         CrossoverType = CrossoverType.SinglePoint  
       };  
       //var crossover = new AveragingBreeder() { Enabled = true };  
       //inject the mutation algorithm  
       var mutation = new SwapMutate(mutationProbability);  
       //var mutation = new PairGeneMutatorWithFixedValueSum2(mutationProbability){Enabled = true};  
       //var mutation = new SingleGeneVaryDoubleValueMaintainingSum3(mutationProbability, 1d, 0.2d) { Enabled = true };  
       gaProvider.AddMutator(mutation);  
       gaProvider.AddBreeder(crossover);  
     }  
     public void Start()  
     {  
       const int elitismPercentage = 10;  
       var dh = new DoubleHelpers();  
       ga = gaProvider.GetGA(elitismPercentage, dh, PopulationSize);  
       GAF.GeneticAlgorithm.GenerationCompleteHandler generationComplete = ga_OnGenerationComplete;  
       GAF.GeneticAlgorithm.RunCompleteHandler runComplete = ga_OnRunComplete;  
       gaProvider.HookUpEvents(generationComplete, runComplete);  
       Task.Factory.StartNew(() => ga.Run(gaProvider.Terminate));  
       IsRunning = true;  
     }  
     public void Stop()  
     {  
       ga.Halt();  
       IsRunning = false;  
     }  
     private void ga_OnGenerationComplete(object sender, GaEventArgs e)  
     {  
       (gaProvider as DoubleChromosomes).CurrentGeneration = e.Generation;  
       var fittest = e.Population.GetTop(1)[0];  
       var r = (gaProvider as DoubleChromosomes).GetR(fittest.Genes.Select(x => x.RealValue));  
       var f = (gaProvider as DoubleChromosomes).GetF(fittest.Genes.Select(x => x.RealValue));  
       FireGenerationArrived(e.Generation, fittest.Fitness, r, f);  
     }  
      void ga_OnRunComplete(object sender, GaEventArgs e)  
     {  
       IsRunning = false;  
       var fittest = e.Population.GetTop(1)[0];  
       var r = (gaProvider as DoubleChromosomes).GetR(fittest.Genes.Select(x => x.RealValue));  
       var f = (gaProvider as DoubleChromosomes).GetF(fittest.Genes.Select(x => x.RealValue));        
       FireFinalResults(fittest.Genes.Select(x => x.RealValue), r, f, fittest.Fitness);  
     }  
     public IGAProvider GetGaProvider()  
     {  
       var gaProvider = new DoubleChromosomes(TargetF, TargetR){MaxGenerations = _maxGenerations};  
       return gaProvider;  
     }  
     private void FireGenerationArrived(int generation, double fitness, double r, double f)  
     {  
       var h = NewGeneration;  
       if (h == null)  
         return;  
       h(this, new GenerationEventArgs(generation, fitness, r, f));  
     }  
     private void FireFinalResults(IEnumerable<double> geneValues, double r, double f, double fitness)  
     {  
       var h = FinalResults;  
       if (h == null)  
         return;  
       h(this, new FinalResultsEventArgs(geneValues, r, f, fitness));  
     }  
     public bool IsRunning { get { return _isRunning; } set { _isRunning = value; OnPropertyChanged(); } }  
     public int PopulationSize { get; set; }  
     public int MaxGenerations {   
       get { return _maxGenerations; }   
       set { if (gaProvider != null) { gaProvider.MaxGenerations = value; } _maxGenerations = value; OnPropertyChanged(); }   
     }  
   }  

The NotifyingObject that both view model and model derive from is a useful class for implementing the INotifyPropertyChanged interface:

 public abstract class NotifyingObject : INotifyPropertyChanged  
   {  
     protected void OnPropertyChanged([CallerMemberName] string propertyName = "")  
     {  
       var eh = PropertyChanged;  
       if (eh != null)  
         eh(this, new PropertyChangedEventArgs(propertyName));  
     }  
     public event PropertyChangedEventHandler PropertyChanged;  
   }  

and the RelayUICoommand is an extension of the RelayCommand class which simply adds a Text property for the button caption

The app that I created allows the used to set the number of chromosomes in the population and the maximum number of generations to allow.  As the algorithm progresses, the values of R and F are plotted along with the fitness, all for the best solution of the current generation.  As a reminder the target values are R=0.95 and F=0.5, a fitness of 1 would be the ideal, and the out of the box SwapMutation and a single point Crossover operations are being employed.  Equally the fitness evaluation has not changed.

I have produced videos of the output (slowed down) using a population of just 4 chromosomes and 10000 chromosomes to compare the results.  Due to the very small population size of the first the algorithm actually evaluates to a worse solution than one found earlier in the evolution, but this highlights the plotting of the results.



Tuesday, 31 March 2015

Quantum Fork

Quantum Fork?

What? you may ask.  Well, as I mentioned in the post yesterday the need to clearly and effectively communicate ideas, requirements and specifications  is vital to a successful software project delivery, and vital to the sanity of the developers by removing the pressure from managers to quantify what is clear to a dev but not others.  So this blog is taking a change, a fork you may say.  The investigation of techniques for solving complex maths problems will continue, but alongside that will be a discussion of methods to communicate between members of an interdisciplinary team (and with those outside the team) in a clear way, preferably without the need to 'translate' the information for each audience member. Quantum Fork as you can follow both paths simultaneously, think of yourself as a superposition of readers.

The decision to rename the blog and include posts on both subjects was made so that you will all reap the benefit of getting all the content I put out, and I will get the benefit of not losing you as readers due to a change or URL.

I look forward to posting my first communication themed musings over the next week.