Skip to main content

Enabling IoT to establish a sustainable value chain


I wrote an article for CIO Applications, here's archive of it

IoT devices are getting more and more intelligent and can now create meshed networks by itself, switching from a sensor into an actor and transferring informations only for the meshed neighbors. For example a connected car could tell the future home that the homeowner will be at home in 5 minutes and the garage door and the door need to be unlocked in time, the lights need to be switched on and the grid operator needs to be informed that the wallbox now charges with 22KW. In near future this will happen over direct meshed information cells, operated by always connected devices, wearables, sensors, actors, mobile devices - short: everything. And all cloud provider offer dozens of solution to master the challenges, on the one, other or complete different way.

Self-organizing mesh networking and communication comes with a permanent flow of information, massive IoT data streams even classic Big Data frameworks like Hadoop cannot handle anymore in time. Coming along with the art of data, the need for data processing changes with the kind of data creation and ingestion. Most analyses will be done on the edge and during the ingestion stream when the data comes to rest. The data lake should be the central core to store data, but the data needs to get categorized and catalogued together with a proper and well defined schema and data description. The intended use of the gravity such data pools generate needs to be applied as the motor of data driven innovation.

Why? Batched processing helps to predict getting value out of stored data even by analyzing multiple other data points and storage facilities, but not to react in time. And timely information in IoT enables business processes only to have a valuable meaning at the time they occur, to do the job stream processing frameworks like Spark or Kafka are more suitable. Combining both techniques brings unmatched value and impact to the business, driven by the right use of data. Stream processing during the data transportation closes the gap between rapid data and data on rest. Mostly referring to the more costly IoT at edge computing, MQTT enabled stream processing engines deliver high throughput over all kind of compute instances, be it in a local data center, hybrid clouds or in public clouds.

The same is countable for available cloud technology. Every cloud provider has his own IoT solution zoo with his own lock-ins, but often they do not fit to scaling plans either in complexity, missing or not well implemented parts or simply the price model is not comparable to the margin getting from an IoT based product. A combined approach of scalable cloud technology (which fits most) and own development brings the most benefit at an affordable price tag, unspoken of the intellectual property a business gains and holds, instead to bring this to providers and therefore competitors. Independent organisations like “Linux Foundation Edge” provide the most useful insight over Open Source projects and initiatives.

Just dumping data somehow without visions behind does not help to solve the problems companies face on their digital journey, especially when it comes to questions of revenue from IoT projects. Big Data needs to have a nearly perfect data management, data rights and data retention process behind. Only this offers the possibilities to get full advantage of any kind of data, to open new revenues and sales streams and to finally see all data driven activity not as a cost saving project (as the most agencies and vendors promise) but as a revenue creation project. Using modern cloud technologies moves organizations into the data centric world, focusing on business and not operations.

Analyzing the data is the more tricky part here - on the one hand every data point brings valuable input, but on the other hand the unlimited data store also brings vulnerabilities to customer insights. I am a bit concerned about 360 degrees approaches. At first the value part of data collections needs to be questioned: which data is system relevant for support, maintenance or emergency and which is important to generate a sustainable revenue. Using streaming analysis gives valuable input at the point in time the information is needed to make decisions, but also gives the possibility to route data into different data stores. It is always unquestionable that the value of customers is higher than the data gathered, implementing a state-of-the-art data ethic catalogue is one of the main tasks analytics needs to cover.

We move quickly to a so-called interconnected world, always connected systems will dominate our future lives, introducing new business models by combining business areas which were not even in the range of combined business models. The future CIO needs to know what implications the data has, what uncountable values this data can generate but also to weight what threats uncontrollable data collections can cause. Building new data driven business will be the most exciting job in future, things never done before are now possible. Embrace this.

The article can be read online: 


Popular posts from this blog

Deal with corrupted messages in Apache Kafka

Under some strange circumstances it can happen that a message in a Kafka topic is corrupted. This happens often by using 3rd party frameworks together with Kafka. Additionally, Kafka < 0.9 has no lock at at the consumer read level, but has a lock on Log.write(). This can cause a rare race condition, as described in KAKFA-2477 [1]. Probably a log entry looks like: ERROR Error processing message, stopping consumer: ($) kafka.message.InvalidMessageException: Message is corrupt (stored crc = xxxxxxxxxx, computed crc = yyyyyyyyyy Kafka-Tools Kafka stores the offset of every consumer in Zookeeper. To read out the offsets, Kafka provides handy tools [2]. But also can be used, at least to display the consumer and the stored offsets. First we need to find the consumer for a topic (> Kafka 0.9): bin/ --zookeeper management01:2181 --describe --group test Prior to Kafka 0.9 the only possibility to get this inform

Hive query shows ERROR "too many counters"

A hive job face the odd " Too many counters:"  like Ended Job = job_xxxxxx with exception 'org.apache.hadoop.mapreduce.counters.LimitExceededException(Too many counters: 201 max=200)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Intercepting System.exit(1) These happens when operators are used in queries ( Hive Operators ). Hive creates 4 counters per operator, max upto 1000, plus a few additional counters like file read/write, partitions and tables. Hence the number of counter required is going to be dependent upon the query.  To avoid such exception, configure " mapreduce.job.counters.max " in mapreduce-site.xml to a value above 1000. Hive will fail when he is hitting the 1k counts, but other MR jobs not. A number around 1120 should be a good choice. Using " EXPLAIN EXTENDED " and " grep -ri operators | wc -l " print out the used numbers of operators. Use this value to tweak the MR s

GPT & GenAI for Startup Storytelling

OpenAI and Bard   are the most used GenAI tools today; the first one has a massive Microsoft investment, and the other one is an experiment from Google. But did you know that you can also use them to optimize and hack your startup?  For startups, creating pitch scripts, sales emails, and elevator pitches with generative AI (GenAI) can help you not only save time but also validate your marketing and wording. Curious? Here are a few prompt hacks for startups to create,improve, and validate buyer personas, your startup's mission/vision statements, and unique selling proposition (USP) definitions. First Step: Introduce yourself and your startup Introduce yourself, your startup, your website, your idea, your position, and in a few words what you are doing to the chatbot: Prompt : I'm NAME and our startup NAME, with website URL, is doing WHATEVER. With PRODUCT NAME, we aim to change or disrupt INDUSTRY. Bard is able to pull information from your website. I'm not sure if ChatGPT