Prototyping is one of the activities a developer should do and probably the most time consuming one for the software architect profile. We believe prototypes are extremely useful to complete the analysis and fit the problem in the developer’s mind.
Think about building a wall: assuming we have the basic knowledge with materials and physics involved, if it’s the first time we are facing this (apparently simple) task, we need to experiment and reduce the problem to a small scale. We can try with play blocks, we can write the project to a paper sheet first, we can build small units to verify our assumptions and, if needed, create and destroy it several times before we are both satisfied of the result and it can fit the initial requirements we received.
Writing software isn’t too different from building buildings: choices we take now are propagated to the future refinements and additions we are making in the near future, with the only difference software is more likely to change. And this last statement is true not because software needs to change, but because it’s easier to let it change compared to other fields of industry (think about changing the roots of a real building: it can be challenging if not impossible at all, with an higher cost).
Thus, software changes since business requirements change too and, more than in other engineering fields, software engineers always tried to think in advance to fit a bigger problem instead of the local problem to be solved today.
We think this approach can be tricky: at one side, over-engineer a software system has the concrete benefits to capture future requirements with less intervention; at the other side, we are now involved into something is not really clear and that does not fit completely in our mind when the creative process is starting.
Prototyping can help us to reach the right trade-off between the present and the future, highlighting issues in advance to take the appropriate decisions. We can prototype a feature, a design, a complete distributed systems and new requirements that are not part of the present project but can be included some time in the future. This creative process helps us to choose what to implement now, what to implement later and what to exclude from the project, due to a complexity that is too high for actual requirements.
Last, prototyping is an exercise for every software developer: we are constantly subjected to changes in technology and to new tools and platforms, so experimenting can both define the right way and provide us the motivation to study and apply innovation iteratively.
Let’s start by simplifying the cases we can work on, with two scenarios. A big difference exists between developing new projects instead of maintaining existing projects. Some of them can be summarized as follows:
- New projects:
- Requirements have been analyzed for the first time (except for re-writes) and the context is not clear
- We can choose the patterns we would like to use and they can be also not the optimal ones. Don’t misunderstand, they should be, but when no code is written, we have more freedom
- It is not clear how the project (or the overall process on top of it) will evolve, introducing a potential bias in the project itself
- Existing projects:
- Despite requirements for new features are new as well, the surrounding project is well-known and is very likely that the reference domain is well-known too
- We can’t always choose the pattern and, even in the rare case we introduce a new logical building block, we probably need to inject it into an existing infrastructure
- Despite it cannot be clear, again, how the project will evolve, we have some history information which can help to fit the problem better than in new projects.
The caching problem
Let’s say we have a caching problem to solve. The requirement is to introduce a caching layer between the data producer and the data consumer, to improve performance and reduce latency due to the underlying data store.
There are at least two types of caching approach:
- On-demand caching: the data consumer tries to lookup the cache and, in case of a cache miss, it provides an implementation to materialize the actual data and put it in the cache
- Active caching: someone (even an external actor) proactively materializes cached objects in the cache at a given frequency. The data consumers will rely on the cached data since they are confident of a cache hit.
The first approach assumes the data consumer knows both the data producer layer and the caching layer, making it more stable in case, for example, caching layer is not available (or it is full, in case of in-memory caching). However, in the case multiple concurrent clients ask the same resource at the same time, each one will produce a miss and, consequently, the materialization function will run multiple times. The second approach, instead, guarantees the minimum access to the underlying data store but, as a downside, the clients assume the existence of objects in the cache, without the possibility to recover in case of cache miss.
There is a third alternative which is a mix of the previous two: a producer materializes the cache continuously to reduce the load impact, while the data consumers uses on-demand caching in the residual case items are not in the caching layer. Either we are working on a new project or an existing one, we must adopt a strategy to cache data.
Writing from scratch
We assume we need to cache a database query in an in-memory cache. The high-level code which performs this simple operation can be the following listing.
private dynamic GetMyData(dynamic cache,dynamic db,string query)
{
var cachedItem = cache.Get(query);
if (cachedItem != null) return cachedItem;
else
{
var materialized = db.Query(query);
cache.Set(query, materialized);
return materialized;
}
}
The flow is very simple:
- We first ask the cache for the cache-key we want (in the case of a SQL-like query, a good cache key can derive from the query text itself)
- If the cached item is found, we return that value
- Otherwise, we ask the Database and we populate the cache, before returning to the user
In the previous code sample, we setup the fictitious GetMyData function to show a tentative flow. The usage of dynamic keyword let us write all the code we want without the need of implement, at least, the class/interface definitions. If the micro-design convinces, we can then replace the occurrences with strongly-type arguments and variables (except in those cases where dynamics are really needed) or proceed with sub-sequent refinements.
Now that we know which is the simplest flow to implement the caching problem, we can summon the context:
- Does the caching layer be applied to multiple queries?
- In multiple parts of the application?
- With multiple types of persistence layers?
- How if we change the in-memory cache implementation?
- And how can we make it DRY?
The number of questions we can make is probably proportional to the experience we have maturated in software development, BUT the underlying risk is to overestimate the big picture and to over-engineer the whole application.
We suggest discussing the solution to rise the maximum number of questions and doubts in the initial phase: however, focus on the actual requirements. If the requirements is “we have a single query to optimize”, the solution above is more than enough. Otherwise, we can choose a pattern depending on the super-set of aspects we need to address or even based on the average case.
Injecting behaviors in existing code
Unfortunately, we are not always lucky to start the project from scratch. Instead, the starting point is very often an existing and complex system, with the tangle it could be dangerous to touch production code with pervasive refactoring.
However, those are the solutions where we can experiment more, since we are required to think at 360° degrees to fit other’s problem in our mind. What we find in an existing project can vary a lot, so we make some assumptions to proceed.
var orders = dal.GetOrdersFor(10);
…
public class DAL
{
public Order[] GetOrdersFor(int customer)
{
var db = new MyEntities();
return db.Orders.Where(p => p.CustomerID == customer).ToArray();
}
}
In the previous listing, we assume we are using Entity Framework (for “MyEntities”). However, can even proceed by faking everything, until we are ready to apply the concept to the real classes. In the companion code repository, you will find a fake implementation of the Order and the MyEntities class, just to let everything compile.
The code above is not production code. First, we can have a factory for our Entity Framework context or receive it injected; we are missing many aspects of a real-world context but this can be the simplification of a common starting point for data access.
We can now modify the previous code as follows:
public class DAL_V2
{
private MemoryCache cache = MemoryCache.Default;
public Order[] GetOrdersFor(int customer)
{
var cachedItem = cache.Get($"GetOrdersFor_{customer}");
if (cachedItem != null) return (Order[])cachedItem;
else
{
var db = new MyEntities();
var result = db.Orders
.Where(p => p.CustomerID == customer)
.ToArray();
cache.Set($"GetOrdersFor_{customer}", result,
new CacheItemPolicy()
{
SlidingExpiration = TimeSpan.FromSeconds(10)
});
return result;
}
}
}
We decided to introduce the MemoryCache dependency of the System.Runtime.Caching namespace. This is a concrete implementation of the abstract class ObjectCache, which defines the generic API of an object cache. The System.Runtime.Caching is a .NET Framework (>=4.0) assembly and there is a package for .NET Standard too. It has been inspired from the ASP.NET cache, where it has similar features. However, it has been designed to remove dependencies from the System.Web namespace, in order to run memory caches even outside the context of an ASP.NET web application. Another benefit of the MemoryCache (compared to the System.Web.Caching.Cache) is the capability to create multiple instances of a MemoryCache in the same AppDomain.
We used the MemoryCache implementation to implement this workflow:
- Lookup the value in the cache by a key
- In case of hit, the value is casted and returned
- In case of miss, the materialization logic is executed and, finally, the cache is populated
However, it couples the caching logic with the database logic and it requires a lot of duplicated code for every data-access method. But, for the sake of simplicity, it can be the proof-of-concept to experiment, test and get some metrics of the feature we need.
Remember that POCs have in common the goal to experiment something we do not know very well, either because we have not yet made it, either because it is very complex.
In fact, POCs can be made at different levels:
- Problem-based: we need to solve a new problem and we want to verify the validity of the model
- Technology-based: the problem is known but we want to experiment a new technology to solve it
- Design-based: the problem and the technology are known and we want to experiment patterns and coding techniques
The first two are related to focus the problem and the right technology. Using the previous samples, the GetMyData fictitious function represents a POC of the caching problem, while the DAL_V2 represents a POC which applies the problem to actual technologies (the System.Runtime.Caching.MemoryCache implementation plus Entity Framework).
Prototyping designs
The designs of the previous samples are poor, since they require a lot of duplicated code and, consequently, a lot of maintenance effort.
Many design principles can be discussed as a trade-off between simplicity and maintainability. Let’s say the top of simplicity stands for “code that can be read and written by everyone”, like the DAL_V2 sample, at the opposite we have even complex topologies and generalization strategies, to pursue cleanliness, decoupling and reuse. It is recognized the first often require much more maintenance effort, while the second require less, but a developed skill.
The boundaries of generalization in software development are subject to evaluation from project to project. The experienced reader can say “it depends on the context” and it is true: in some circumstances, we can break all the “best-practices” and produce artifacts more maintainable than the engineered ones. However, in most cases, the benefits of the DRY (Don’t Repeat Yourself) approach are real, reason why patterns are so discussed in software engineering. Finally, we want to propose an extension to the DRY approach, that should sound like “Don’t Repeat Yourself, when your design has been proven”.
It is dangerous to apply, blindly, design patterns or “layerize” software written for new features, where those features can hide a lack of specific knowledge of the underlying domain. Talking about e-commerce let’s say we need to add a “couponing” feature, letting users burn codes during checkout to redeem benefits. Existing e-commerce software can emphasize reuse, can be well written and puts in place pattern and best practices.
If we are new on the couponing scope, why should we fit in the existing layers from the beginning, when we can proceed as outsiders until the scope is clear?
We would definitely repeat ourselves during the POC phase, since the most important achievement in that phase is the knowledge of the problem and the validity of the solution. Should we put in production “outsider” code? We can, until the result confirms our hypothesis.
When we are aware of the context and we have gained the knowledge of the scope we are working into, we often would like to make some experiments around designs, to make our code more maintainable, clean and integrated with the rest of the application.
Refactoring should take a consistent portion of time of a developer’s working day, since it pursues both quality and knowledge refreshment on the specific topic. Periodic refactoring can suggest to developers optimizations and approaches to coding not known at the time or writing.
Companion code here:
https://github.com/childotg/iSolutionsLabs