The fight against technical debt is one of the CTO/CIO major challenge. What's funny with the technical debt is that we often associate it to spaghetti: a clear picture that says, if you try clean a small part, you'll probably have to clean everything. But there is bias coming from this spaghetti idea we have of the IT mess:
We are over-estimating links and under-estimating contextual knowledge on linked components;
We are considering the nodes like black-box, yet these black-box can really be complex, besides the fact that it keeps changing over time.
The spaghetti nightmare
The Cabling Technical Debt: Awful Yellow Spaghetti.
What we expect to see on any data-center: clean and labeled cabling. Note : The label on the cable gives you an entry to understand the context of the cable.
Most of the visualization tools we saw are graph (nodes and links) based. Seen in a large scale, these visualizations will also look like spaghetti, and it is funny to see how we are reducing our visualization capabilities to what we know (or what we believe to be true paradigm).
The digital world is much more like connected broccoli
Reducing IT to spaghetti was probably a good idea at the time we were racking and pulling network cables everywhere across the data centers, but these times are over. This is virtualization and Software As Everything time and consequently IT, is much much more dense than any spaghetti.
For those who are not experts, the artistic draw which can probably give a better idea of how this digital world looks like, are some fractal draws. Indeed, these fractal draws have something common with the digital world: There are component repetitions everywhere!
When mapping the digital world we're trying to map something we can not see. We - as operational expert - can only imagine how the runtime system looks like according to some architecture diagrams and do the reconciliation with some technical operations on the runtime. The technical operations done on the runtime will define our expertise and our silo in the organization.
Some organizations may have a LOT of silos/experts and most of the time there is a lot of work to improve the communication between these silos/experts. This is one of DevOps target. And clearly one of the major weakness of current DevOps tools is visualization and knowledge management.
Fractals in Ghost In The Shell
Information System visualization in Ghost In The Shell. This guy browses the running system like I'm able to walk in some unknown city with google maps.
And finally, Ghost In The Shell visualization tools seem to be much more complete, realistic and usable than any other product on our present market. Note the repetitive graphic patterns which could be a big fractal base.
When looking back at the software layer and the developers work, there is something fascinating when you realize that whatever the language and the complexity, they are all following the same basic patterns and models. These simple patterns and models allow you to build almost everything ! The reason why I personally prefer to compare the digital world to connected broccoli instead of spaghetti...
Brocolis is a fractal too ;)
Improving the digital world mapping with fractals and graphs
Some refresh about fractals
Fractal word has been introduced by Benoit Mandelbrot in 1975, but is the result of four centuries of mathematical research.
Mandelbrot definition of fractals
Fractal is "a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole"
Fractals are particularly interesting when it comes to setup efficient visualization algorithms to build rough objects. It begins with an initiator which can be a really simple euclidian object, and a generator which is a simple rule you can iterate infinitely from the initiator.
Koch snowflake is a famous fractal following self-similar pattern. It illustrate perfectly how we can build easily a really complex object by incrementing simple geometric operations.
Improve the digital world visualization means improve the digital world fractal initiator
It has recently been demonstrated that complex networks are self-similar. That means they follow some power law, and that the fractal geometry tools can be applied to the study of these networks.
When speaking about the world-wide web, we're generally describing a big graph with nodes as routers and links between. But this modelization is also inefficient, as it limits the routers, just like a black-box, and, consequently, it is not helpful for the expert who needs to get the running structures of these routers. Basically, there is another graph inside the router, and it is part of the global world wide graph.
The ability to look inside the digital components becomes more and more important as long as we're pushing more and more business on these components, and as long as we're densifying the digital architecture with new technologies like VMs or containers.
Considering the graph objects (IE: nodes and links) as the world wide network's fractal initiator, it would be legitimate to improve this initiator, in order to provide a new and complete way to visualize the complex runtime we may setup in many business. This will be discussed in the next article.
Visualization is a key factor for any software sales. One of the most definitively sexy visualization tools to show to any CIO/CTO is mapping. This is the reason why you will see lots of products giving you a taste of IT mapping, even if they don't have same purposes and work differently. Let's go on a little tour of the market...
The ITSM tools
Basically, they play a referential role with some CMDB, plus some mapping visualization and links to ticket tracking for purposes of following changes on each object in the referential. Most of them provide some automated discovery on the physical assets in your data centers, and then run some commands on the identified assets to look at the services running on these assets and their relationships.
These tools are able to display some IT map butregarding the granularity (assets and services), given that they are targeting the managers or high level architects. Actually, this granularity doesn't fit the OPS and DEV teams who are looking for services and assets network contexts, and internal details to understand or fix a problem. Unfortunately, the devil is in the details...
The monitoring and APM tools
Performance monitoring and alerting form the basis for monitoring and other APM functionality, but these tools offer some maps as well. There are many monitoring tools, and all of them have a specific target: System OPS, Database OPS, Middle-ware OPS, Application OPS .... This means you will have as many tools as teams and silos you find in your company. Then, to get the complete map - or at least a complete part of your global map - you'll need data / expertise reconciliation, which could take time...
System monitoring with Nagios
Probably one of the oldest system monitoring solution on the market, Nagios offers several visualization tools. Among them this one giving you the layer 3 map of your network and the devices on it.
Network monitoring with Netbrain
Netbrain can help your discover the network (layer 3 or layer 2) path between two points (and more)
Application monitoring with ITRS Geneos
ITRS Geneos help you build dashboard aggregating several monitoring data coming from your distributed system. This is a pretty map indeed with contextual data ... Just one problem : must be done/updated manually.
Application Performance Monitoring with AppDynamic
AppDynamic is a famous APM which can show you the world from the application perspective (Layer 5).
Some other inspiring maps
Netflix microservices visualisation tool
Netflix is working hard to master its microservices architecture. Here is another visualization example. Extract from "A Microscope on Microservices" : "This internal utility, Slalom, allows a given service to understand upstream and downstream dependencies,their contribution on service demand, and the general health of said requests."
Simianviz / Spigo
@adrianco, the father of Netflix Microservices architecture, is currently working on a simulation tool to visualize different architectures (NetflixOSS, Lamp ...).
Another specialized graph for Kubernetes users... Note the contains links which could leads to a bigraph visualization.
Is there anything missing ?
As has already been said, these specialized maps are the result of extreme OPS expertise silozation. But wait: Aren't we forgetting some population here? DEV ? Aah yeah! Is there any kind of map for DEV dudes? Like something helping visualizing threads or actors? Not really. DEV are using some tools like Kcachegrind or other profiler to browse the stack and see where time is consumed.
Kcachegrind in action
Used on DEV environment only, profiler like Kcachegrind are helpul for developers.
Unfortunately these tools consume a LOT of CPU memory and therefore are never used in production, and some times they consume so much of it that you're unable to reproduce a problem using them... And finally, they are not showing the threads/actors topology, but aggregated statistics on the run stack and calls order tree .
Akka actors in action
This is how @hicolour specified a scalable load balancer with Akka. There is currently no tool able to represent this granularity. However, all this definition of actors are the keys to understand how this dynamic system works
But many problems are coming from un-mapped components like processes, threads or actors. Finally, these components are the basis for the application's architecture. The ability to map such components brings us to a new area where OPS can communicate with DEV much more easily and so solve problems more efficiently.