QuestDB migration for Iot data

The reason for considering this is to enable Iot data to be captured on a raspberry pi, with consolidation to desktop.

If successful the existing mongodb database would be migrated on the desktop

Evaluation

Some criteria that should be matched:

Installation

Assume installation using docker (at least to start with)

sudo docker run -p 9000:9000 -p 8812:8812 questdb/questdb

Getting started

Basic information

Web ConsoleWeb Console
Web Console(remote)Web Console (Desktop)
Postgre compatible port:8812
File storage/var/lib/questdb/confIs this where data is stored?

Does it need to stay resident - when run it is online?

To run QuestDB

docker run -d -p 9000:9000 -p 8812:8812 --name questdb questdb/questdb

Create tables

            
        CREATE TABLE packet_data (
            timestamp TIMESTAMP,
            board SYMBOL,
            pin SYMBOL,
            reading DOUBLE
            ) timestamp(timestamp) PARTITION BY DAY;
            
        

This could either be input via the console or

PARTITION BY DAY means that the table can be updated out of sequence

Loading Data

Wrote a script to capture ha_insert_packet mqtt packets from the network, this has a number of options:

This means that the pi must be able to store at least 35 packets/sec

Now need another script to transfer data from mongodb.

To find storage use

Notice that several CPUs are at 4.8GHz and i7 power consumption up to 90W whilst inserting packets into database

Attractive features of QuestDB

QuestDB Website

Loading data from mongodb

>

The database in mongodb is stored in:

databaseha_log
collectionlogcache2

The structure of each document in logcache2 is as follows:

            
                {
                    board: 101 , // board id for document
                    pin:   1,    // pin number
                    pstart: ???, // start timestamp datetime
                    pend: ???,   // end timestamp datetime
                    result: [
                      T: ???,     // timestamp for reading  from d.getTime()
                      R: ???      // reading in suitable integer units (milliamps, 1/10 degree, etc)
                    ]
                } // each document contains results where T lies between pstart and pend
            
        

Wrote script to load ha_log/logcache2 data into questdb by date range load_from_mongodb.js Now loaded a couple of days data, and observed the following

Start dateEnd DatePacketsPackets/secsize(MB)Elapsed (m:s)
2024-09-02T23:002024-09-04T19:59697222357??
2024-09-01T23:002024-09-02T23:0045823136026021:23
2024-08-20T23:002024-09-01T23:0054114923311103.984272:40

Summary of results

Total elapsed timw294m 03s
Packets loaded5411492
Storage required1103.982Mb
Elapsed time represented14 days 20:39 (14.86days)
Approx storage required for a day75Mb
Approx storage required for a year27Gb
Approx storage for 6 years163Gb
Elapsed time to load 1 year112 hours
Elapsed time to load 6 years670 hours

The data occupies about 35Gb when stored in MongoDB, so at present there aren't enough reasons to justify doing this.

It is clear from this loading exercise that the limit on loading rate is the rate of receipt of packets, rather than writing them to the database.

BUT some substantial improvements can be made by choice of the most appropriate data types:

ColumnCurrent SizeNew TypeNew Size
timestamp8timestamp8
board4SYMBOL4
pin4SHORT2
reading8SHORT2
TOTAL2414

reading as SHORT is from the udp packet layout, mqtt packets could have readings that require more storage

making only board a symbol drops the index on pin, which will save space

What data is stored

UDP packets are delivered from up to 2 ethernet connected boards at each position. These UDP packets are fixed format as follows:

ItemStart byteEnd BytePossible valuesDescription
Command11 D,N,Y,T,L,S

DHYT apply to programs

  • D - Dim
  • N - Turn off
  • Y - Turn on (and set level)
  • T - Toggle

L controls individual lights typically DNYT commands apply a number of L commands

S commands report a reading from a Source, Pin combination

Command which determines action and the remainder of this packet
Target220 to 255Intended recipient of packet
Program330 to 255Used by Y,N,T,D commands to control a set of things that are grouped together (typically lights)
Source440 to 255Board or device that sourced the packet
Reason55CodeReason code (see source for interpretation)
Pin660-255Pin on board
Level770-255Level applied to program or pin
Reading91016 bit integerReading encoded as a value in suitable units

UDP packets are translated to/from mqtt by the server

The mqtt packet structure is as follows:

            
                {
                    command: ,
                    target: ,
                    program: ,
                    source: ,
                    reason: ,
                    pin: ,
                    level: ,
                    reading: 
                }
            
         

The majority of packets report status via S commands, historically these have been stored in mongodb historically

Not all S type packets are stored, there is an array in ha_mongopacket.js , which resticts the packets stored. If this restriction isn't repeated packets from other sources would also be stored.

The remaining commands D,N,Y,T,L are stored in a mysql database, which to date has grown to 2.5Gb

Direct ingestion

Rather than using SQL insert statements questdb supports direct ingestion using its nodejs api rather than pg. this reduces server load significantly (140W down to 90W, so a 50W reduction in processor power requirement) and probably increases packets/sec.

Revised architecture

It is becoming apparent that the j4105 server could be replaced by one or more Raspberry Pi

Functions to store a full database of 6 or more years readings could be online part of the time, so could be hosted on the desktop.

At the front end all packets should be stored and retained for a short period (1 week?), as data reaches a week old that day's partition should be dropped from the frontend once they have been consolidated on the desktop.

Installing QuestDB on a Pi

This article Create an IoT server with QuestDB and a Raspberry Pi describes how to install QuestDB on a Pi.

I have installed on a Pi 4 with a new 32Db SD Card. Using the following details:

ItemValue
Operating SystemRaspberry Pi OS (bookworm) Lite 64 Bit
NameQuest01
Raspberry Pi DeviceRaspberry Pi 4
SSHEnable SSH
SSH - OptionUser Name and Password
User name and passwordjules with usual password
Wifi Networkstringeroakhouse with usual password

Follow the instructions in the article to install QuestDB 7.3 on the pi

Console Interface for QuestDB is then here QuestDB

Tables on the PI

Next we need to define some tables, if we are to record all UDP packets, then we need:

readings

            
                CREATE TABLE readings (
                    timestamp TIMESTAMP,
                    board SYMBOL,
                    pin SHORT,
                    reading DOUBLE
                    ) timestamp(timestamp) PARTITION BY DAY;
                CREATE TABLE programs (
                    timestamp TIMESTAMP,
                    command SYMBOL,
                    program SHORT,
                    level BYTE,
                    target BYTE,
                    source BYTE,
                    reason BYTE
                ) timestamp(timestamp) PARTITION BY DAY;
                CREATE TABLE lights (
                    timestamp TIMESTAMP,
                    board SYMBOL,
                    pin SHORT,
                    level BYTE,
                    source BYTE,
                    reason BYTE
                ) timestamp(timestamp) PARTITION BY DAY;
            
         

The Pi needs nodejs installed Install nodejs on raspberry Pi

It also needs some node_modules

And quest01_packet_inserter.js

Should tables now be just for their original purpose now they are easier to use, e.g.

This would enable:

Pi retention policy

On the Pi the database is stored in ~/.questdb , so the size of the database can be determined by:

                
                    jules@quest01:~ $ du -h ~/.questdb/db
                
            

Tables on the Pi will need to be have the oldest data removed periodically, as tables are partitioned by day, data can be deleted efficiently with an ALTER TABLE DROP PARTITION query.

The following retention times are suggested for different tables

Direct ingestion of readings, lights and programs

Modified versions of the scripts to load these from mqtt were written, some issues were found with these:

Autostart

In order to auto start on boot the following a service file is needed as follows:

            
                [Unit]
                Description=QuestDB Service
                After=network.target
                
                [Service]
                Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64"
                ExecStart=/home/jules/questdb/questdb.sh start -n
                ExecStop=/home/jules/questdb/questdb.sh stop
                Restart=on-failure
                User=jules
                RestartSec=5
                WorkingDirectory=/home/jules/questdb
                
                [Install]
                WantedBy=multi-user.target
            
         

This is an adaption of the instructions to start questdb on a terminal, with addition of:

Each of the service files quest01_readings, quest01_lights, quest01_programs needs the Unit section modifying as follows:

            
                [Unit]
                Description=QuestDB Readings Service
                After=questdb.service
                Wants=questdb.service
            
         

Power consumption

Power consumption with questdb, quest01_readings, quest01_lights, quest01_programs, server (idle) varies between 0.58A and 0.85A, with the average being 0.7A all @5.18V so an average 3.83w

Log files

By default questdb has collected about 16Gb of log files in 4 days. A first attempt to get control of this was made using logcache. To do this I installed logcache and configured it.

            
                sudo apt-get install logrotate
            
        

Logrotate needs the configuration file /etc/logrotate.d/questdb

            
                ~/.questdb/log/*.txt {
                    daily
                    rotate 2
                    compress
                    delaycompress
                    missingok
                    notifempty
                    copytruncatey
                    create 640 jules jules
                }
            
        

In order to take effect logrotate needs to be run, and to take effect daily it needs to be run as a cron job

HOWEVER questdb also has its own sophisticated configuration to control logs, which is probably better than logrotate. This is set up via ~/.questdb/conf/log.conf as follows:

            
                # list of configured writers
                writers=file,stdout,http.min
                #,alert
                
                # file writer
                #w.file.class=io.questdb.log.LogFileWriter
                #w.file.location=questdb-debug.log
                #w.file.level=INFO,ERROR
                
                # rolling file writer 
                w.file.class=io.questdb.log.LogRollingFileWriter
                w.file.location=${log.dir}/questdb-rolling.log.${date:yyyyMMdd}
                w.file.level=INFO,ERROR
                #rollEvery accepts: day, hour, minute, month
                w.file.rollEvery=day
                #rollSize specifies size at which to roll a new log file: a number followed by k, m, g (KB, MB, GB respectively)
                w.file.rollSize=128m
                #lifeDuration accepts: a number followed by s, m, h, d, w, M, y for seconds, minutes, hours, etc.  
                w.file.lifeDuration=1d
                #sizeLimit is the max fileSize of the log directory. Follows same format as rollSize
                w.file.sizeLimit=1g

                # stdout
                w.stdout.class=io.questdb.log.LogConsoleWriter
                w.stdout.level=ERROR
                
                # alert
                # w.alert.class=io.questdb.log.LogAlertSocketWriter
                # w.alert.level=CRITICAL
                # w.alert.location=/alert-manager-tpt.json
                # w.alert.alertTargets=localhost:9093,209.111.255.57:2468, ... ,localhost:9999
                # w.alert.inBufferSize=2M
                # w.alert.outBufferSize=4M
                # w.alert.reconnectDelay=250 # 1/4th sec (milli precision)
                # w.alert.defaultAlertHost=127.0.0.1
                # w.alert.defaultAlertPort=9093
                # disable logging out of min http server, which is supposed to be used
                # for frequent monitoring
                w.http.min.class=io.questdb.log.LogConsoleWriter
                w.http.min.level=ERROR
                w.http.min.scope=http-min-server
            
        

After restarting questdb, this now creates logs called ~/.questdb/log/questdb-rolling.log.20240910

This creates files of no more than 128Mb and a total of 1Gb altogether, so no need to run logrotate

Contents of log files

At the head of the log file from system start (head [logfile] --lines=100) there are some warnings:

Performance

After about 2 weeks running there is a noticeable deterioration in performance. It is proposed to resolve this by:

Combination tables to substantially reduce update traffic

The original readings table did an insert once per board/pin combination with quite high frequency. Stopping the quest01_readings service has dramatically reduced CPU load.

Instead of the readings table one of the main causes of traffic is current measurement for LightBoards, instead a light_status table has been implemented, one mqtt packet for all light boards is produced once a minute. With this implementation these is one row per board.

There are the following classes and tables

ClassImplementation
LightBoard
tablelight_status
update scriptquest01_light_status
Serverimplemented
powerwall
tableupdate scriptServer
powerwall_metersquest01_powerwall_metersimplemented
powerwall_operationquest01_powerwall_operationimplemented
powerwall_soequest01_powerwall_soeimplemented
SwitchBoard
tableswitch_status
update scriptquest01_switch_status
Serverimplemented
ThermoCouple
HeatmiserV3Slave
tablethermostats
update scriptquest01_thermostats
Server
Alarm
octopusrates
octopususage
solcast
tableupdate scriptServer
solcast_forecastquest01_solcastimplemented
solcast_historyquest01_solcastimplemented

There is now provision for more than one site, investigate with solcast how many arrays we can have - can we support east and west facing roofs as well?

Plug
Car
openevse
tableupdate scriptServer
openevsequest01_openevse
weather
tableupdate scriptServer
weatherquest01_weatherpartial
carbonintensity
batteryshunt

Have either RS485 to USB cable to connect to Pi, or possibly bluetooth

Victron IP22

Have either bluetooth or VEbus pins inside to connect to UART to USB converter

solarthermal

Investigate replacing RESOL BS/4 with one with VBus and get either

ctreader

Investigate fitting cheap board to measure current for up to 4 devices from SCT-013

  • mlabs2008
  • Need to investigate a suitable wemos esp32 or nodemcu board

Implemented circuit, which biases one leg of CT with a potential divider across 3.3v with an AC filter, so that DC is lifted to 1.65V, but anything over 3.8hz is sent to ground. The other leg of the CT is connected to an analog input of TM4C.

As of 29/11/2024 getting this needs to be debugged.

ftc6

New heatpump flow controller, with MODBUS interface installed. Modbus interface implemented on TM4C. See heating upgrade for further details

tableupdate scriptServer
ftc_dailyquest01_ftc6TODO
ftc_performancequest01_ftc6Implemented

For each class for the following need to be implemented:

And optionally

Recovery

After nearly 3 months running several larger tables stopped updating. The first clue comes from selecting rows from these tables all these selections ended about 2024-11-25T15:45.

As the tables are partitioned by day list partitions:

            
                SHOW PARTITIIONS FROM {tablename}
            
        

This confirms that all these tables ended at about the same time.

Inspecting the logs confirms the cause of failure, first we list logs to see which one to look at:

            
jules@quest01:~/.questdb/log $ ls
questdb-rolling.log.20241128    stdout-2024-09-17T15-55-27.txt  stdout-2024-10-06T11-13-04.txt  stdout-2024-11-27T16-35-35.txt
questdb-rolling.log.20241129    stdout-2024-09-19T16-45-11.txt  stdout-2024-10-26T10-29-53.txt  stdout-2024-11-27T16-40-07.txt
stdout-2024-09-17T15-54-39.txt  stdout-2024-09-21T16-52-31.txt  stdout-2024-11-25T15-45-22.txt
stdout-2024-09-17T15-54-43.txt  stdout-2024-09-21T17-00-10.txt  stdout-2024-11-27T13-06-40.txt                            
            
        

The above listing shows that stdout-2024-11-25T15-45-22.txt seems to end at about the same time as the tables, so looking at the tail of it should show what happened.

            
jules@quest01:~/.questdb/log $ tail stdout-2024-11-25T15-45-22.txt
2024-11-25T15:45:35.754636Z A pg-server listening on 0.0.0.0:8812 [fd=111 backlog=64]
2024-11-25T15:45:35.761813Z A tcp-line-server listening on 0.0.0.0:9009 [fd=113 backlog=256]
2024-11-25T15:45:36.106464Z A server-main server is ready to be started
2024-11-25T15:45:36.210893Z A server-main enjoy
2024-11-25T15:45:36.565780Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=programs_direct~11, error=segment /home/jules/.questdb/db/programs_direct~11/wal5776/0/_event.i does not have txn with id 9, offset=611, indexFileSize=104, maxTxn=8, size=611, errno=0]
2024-11-25T15:45:36.567543Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=light_status~21, error=segment /home/jules/.questdb/db/light_status~21/wal9/4/_event.i does not have txn with id 4721, offset=316315, indexFileSize=37784, maxTxn=4720, size=316315, errno=0]
2024-11-25T15:45:36.691832Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=switch_status~24, error=segment /home/jules/.questdb/db/switch_status~24/wal4/5/_event.i does not have txn with id 100, offset=6708, indexFileSize=816, maxTxn=99, size=6708, errno=0]
2024-11-25T15:45:36.723872Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=openevse~27, error=segment /home/jules/.questdb/db/openevse~27/wal4/0/_event.i does not have txn with id 53466, offset=2673308, indexFileSize=427744, maxTxn=53465, size=2673308, errno=0]
2024-11-25T15:45:36.777000Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_meters~18, error=segment /home/jules/.questdb/db/powerwall_meters~18/wal7113/0/_event.i does not have txn with id 1403, offset=94009, indexFileSize=11248, maxTxn=1402, size=94009, errno=0]
2024-11-25T15:45:36.802191Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_soe~17, error=segment /home/jules/.questdb/db/powerwall_soe~17/wal1820/0/_event.i does not have txn with id 1719, offset=85958, indexFileSize=13776, maxTxn=1718, size=85958, errno=0]
            
        

Here we see that 6 tables have been suspended, because of a missing id.

Clearly I should have implemented a recovery strategy earlier!

Backup strategy

The enterprise version of questdb lets you checkpoint the database which stops updates, and then make a copy. Something similar could be achieved by stopping the database, copying it and starting it again.

In practice the data can be extracted on a daily basis by extracting a previous day's updates using a script to select all rows from each table for a day and save the result to a json file. This script is extract_day.js

extract_day.js

This is in ~/Projects/quest01/scripts , it is run as follows:

            
                node ~/Projects/quest01/scripts/extract_day.js [options]
            
        

[options]:

For each day a folder is created in ~/Projects/quest01/data , which contains each of the extracted tables.

Cron job

A cron job has been created on the desktop which runs 5 minutes after boot, the entry in crontab -e is as follows:

            
                @reboot /usr/bin/sleep 300 && /usr/bin/node /home/jules/Projects/quest01/scripts/extract_day.js yesterday            
            
        

Repairing tables

Without further action suspended tables cannot be written to, the procedure for repairing tables in this case at least is to drop the broken partition, and possibly rebuild it. Dropping the partition is done as follows:

After which you may restore the missing data.

Potential recovery improvements: