QuestDB migration for Iot data

The reason for considering this is to enable Iot data to be captured on a raspberry pi, with consolidation to desktop.

If successful the existing mongodb database would be migrated on the desktop

Evaluation

Some criteria that should be matched:

It should be possible to keep the database on a raspberry pi
When migrated from MongoDB the existing database should be no larger
Insertion performance on raspberry pi should be able to keep up with arrival rate of packets
Query performance on a raspberry pi should be as good as mongodb on the current server

Installation

Assume installation using docker (at least to start with)

sudo docker run -p 9000:9000 -p 8812:8812 questdb/questdb

Getting started

Basic information

Web Console	Web Console
Web Console(remote)	Web Console (Desktop)
Postgre compatible port	:8812
File storage	/var/lib/questdb/conf	Is this where data is stored?

Does it need to stay resident - when run it is online?

To run QuestDB

docker run -d -p 9000:9000 -p 8812:8812 --name questdb questdb/questdb

Create tables

            
        CREATE TABLE packet_data (
            timestamp TIMESTAMP,
            board SYMBOL,
            pin SYMBOL,
            reading DOUBLE
            ) timestamp(timestamp) PARTITION BY DAY;

This could either be input via the console or

PARTITION BY DAY means that the table can be updated out of sequence

Loading Data

Wrote a script to capture ha_insert_packet mqtt packets from the network, this has a number of options:

Store packet in packet_data table and count rate at every 100 packets seems to stabilise at about 35packets/sec on desktop
Just count packets arriving - after about 10000 packets this also settles down at about 35 packets/sec

This means that the pi must be able to store at least 35 packets/sec

Now need another script to transfer data from mongodb.

To find storage use

docker exec -it questdb sh
cd /var/lib/questdb/db
du -h
The above should get you into the process space for docker, change directory to the storage folder and list sizes
Currently db is 179M, but doesn't seem to increase with packets inserted.

Notice that several CPUs are at 4.8GHz and i7 power consumption up to 90W whilst inserting packets into database

Attractive features of QuestDB

QuestDB Website

Reported high performance on lightweight hardware, such as Raspberry Pi
Uses Postgre SQL , so can use SQL syntax to manipulate data
Open Source - website not yet commercially pushy
SQL Extensions designed for timeseries based data:
- SAMPLE BY - applies query (aggregate) to FROM TO divided into fixed ranges, could this be used for barcharts of consumption in a period.
- WHERE IN - tests from timestamp column in a range e.g. WHERE timestamp IN '2018'
- LATEST ON
- ASOF JOIN
Recently developed, blog includes articles to describe setting up on Raspberry Pi

Loading data from mongodb

The database in mongodb is stored in:

database	ha_log
collection	logcache2

The structure of each document in logcache2 is as follows:

            
                {
                    board: 101 , // board id for document
                    pin:   1,    // pin number
                    pstart: ???, // start timestamp datetime
                    pend: ???,   // end timestamp datetime
                    result: [
                      T: ???,     // timestamp for reading  from d.getTime()
                      R: ???      // reading in suitable integer units (milliamps, 1/10 degree, etc)
                    ]
                } // each document contains results where T lies between pstart and pend

Wrote script to load ha_log/logcache2 data into questdb by date range load_from_mongodb.js Now loaded a couple of days data, and observed the following

Start date	End Date	Packets	Packets/sec	size(MB)	Elapsed (m:s)
2024-09-02T23:00	2024-09-04T19:59	697222	357	?	?
2024-09-01T23:00	2024-09-02T23:00	458231	360	260	21:23
2024-08-20T23:00	2024-09-01T23:00	5411492	331	1103.984	272:40

Summary of results

Total elapsed timw	294m 03s
Packets loaded	5411492
Storage required	1103.982Mb
Elapsed time represented	14 days 20:39 (14.86days)
Approx storage required for a day	75Mb
Approx storage required for a year	27Gb
Approx storage for 6 years	163Gb
Elapsed time to load 1 year	112 hours
Elapsed time to load 6 years	670 hours

The data occupies about 35Gb when stored in MongoDB, so at present there aren't enough reasons to justify doing this.

It is clear from this loading exercise that the limit on loading rate is the rate of receipt of packets, rather than writing them to the database.

BUT some substantial improvements can be made by choice of the most appropriate data types:

Column	Current Size	New Type	New Size
timestamp	8	timestamp	8
board	4	SYMBOL	4
pin	4	SHORT	2
reading	8	SHORT	2
TOTAL	24		14

reading as SHORT is from the udp packet layout, mqtt packets could have readings that require more storage

making only board a symbol drops the index on pin, which will save space

What data is stored

UDP packets are delivered from up to 2 ethernet connected boards at each position. These UDP packets are fixed format as follows:

Item	Start byte	End Byte	Possible values	Description
Command	1	1	D,N,Y,T,L,S DHYT apply to programs D - Dim N - Turn off Y - Turn on (and set level) T - Toggle L controls individual lights typically DNYT commands apply a number of L commands S commands report a reading from a Source, Pin combination	Command which determines action and the remainder of this packet
Target	2	2	0 to 255	Intended recipient of packet
Program	3	3	0 to 255	Used by Y,N,T,D commands to control a set of things that are grouped together (typically lights)
Source	4	4	0 to 255	Board or device that sourced the packet
Reason	5	5	Code	Reason code (see source for interpretation)
Pin	6	6	0-255	Pin on board
Level	7	7	0-255	Level applied to program or pin
Reading	9	10	16 bit integer	Reading encoded as a value in suitable units

UDP packets are translated to/from mqtt by the server

The mqtt packet structure is as follows:

            
                {
                    command: ,
                    target: ,
                    program: ,
                    source: ,
                    reason: ,
                    pin: ,
                    level: ,
                    reading: 
                }

The majority of packets report status via S commands, historically these have been stored in mongodb historically

Not all S type packets are stored, there is an array in ha_mongopacket.js , which resticts the packets stored. If this restriction isn't repeated packets from other sources would also be stored.

The remaining commands D,N,Y,T,L are stored in a mysql database, which to date has grown to 2.5Gb

Direct ingestion

Rather than using SQL insert statements questdb supports direct ingestion using its nodejs api rather than pg. this reduces server load significantly (140W down to 90W, so a 50W reduction in processor power requirement) and probably increases packets/sec.

Revised architecture

It is becoming apparent that the j4105 server could be replaced by one or more Raspberry Pi

Functions to store a full database of 6 or more years readings could be online part of the time, so could be hosted on the desktop.

At the front end all packets should be stored and retained for a short period (1 week?), as data reaches a week old that day's partition should be dropped from the frontend once they have been consolidated on the desktop.

Installing QuestDB on a Pi

This article Create an IoT server with QuestDB and a Raspberry Pi describes how to install QuestDB on a Pi.

I have installed on a Pi 4 with a new 32Db SD Card. Using the following details:

Item	Value
Operating System	Raspberry Pi OS (bookworm) Lite 64 Bit
Name	Quest01
Raspberry Pi Device	Raspberry Pi 4
SSH	Enable SSH
SSH - Option	User Name and Password
User name and password	jules with usual password
Wifi Network	stringeroakhouse with usual password

Follow the instructions in the article to install QuestDB 7.3 on the pi

Console Interface for QuestDB is then here QuestDB

Tables on the PI

Next we need to define some tables, if we are to record all UDP packets, then we need:

readings (S Packets)
programs (YNTD Packets)
lights (L Packets)

readings

            
                CREATE TABLE readings (
                    timestamp TIMESTAMP,
                    board SYMBOL,
                    pin SHORT,
                    reading DOUBLE
                    ) timestamp(timestamp) PARTITION BY DAY;
                CREATE TABLE programs (
                    timestamp TIMESTAMP,
                    command SYMBOL,
                    program SHORT,
                    level BYTE,
                    target BYTE,
                    source BYTE,
                    reason BYTE
                ) timestamp(timestamp) PARTITION BY DAY;
                CREATE TABLE lights (
                    timestamp TIMESTAMP,
                    board SYMBOL,
                    pin SHORT,
                    level BYTE,
                    source BYTE,
                    reason BYTE
                ) timestamp(timestamp) PARTITION BY DAY;

The Pi needs nodejs installed Install nodejs on raspberry Pi

It also needs some node_modules

mqtt
pg

And quest01_packet_inserter.js

Should tables now be just for their original purpose now they are easier to use, e.g.

Battery for the state of the powerwall
Thermostats
Hot Water
EVSE
Plugs - tasmota plugs

This would enable:

Different retention policies for each dataset
Appropriate columns for each type, for thermostats would have:
- Measured Temperature
- Set temperature
- State
Tightest possible use of data types, e.g. use byte, short where udp packet says, but where table has been repurposes use correct type just for that.

Pi retention policy

On the Pi the database is stored in ~/.questdb , so the size of the database can be determined by:

                
                    jules@quest01:~ $ du -h ~/.questdb/db

Tables on the Pi will need to be have the oldest data removed periodically, as tables are partitioned by day, data can be deleted efficiently with an ALTER TABLE DROP PARTITION query.

The following retention times are suggested for different tables

Lights and Programs - 1 week with no need for secondary storage, we do need an up to date summary of power to each bulb
Thermostats - 1 week, with indefinite backup on desktop - not that readings only need to be retained when a change occurs
Battery - 1 week, but 5 minute summary required over long term
Hot water - 1 week, but key points summary needed long term
Plugs - 1 week , but store sample data for each device.

Direct ingestion of readings, lights and programs

Modified versions of the scripts to load these from mqtt were written, some issues were found with these:

It had been noted that the postgre based script for lights timed out after 5 minutes of inactivity. This is resolved by the new script.
lights and programs tables wouldn't load using the modified scripts, so tables were created directly by the ingestion script for these (lights_direct and programs_direct). The created table had long columns where short or byte had been used before.
With these scripts running using systemctl, top in the Pi 4 shows idle between 40% and 80%, which is probably an improvement.

Autostart

In order to auto start on boot the following a service file is needed as follows:

            
                [Unit]
                Description=QuestDB Service
                After=network.target
                
                [Service]
                Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64"
                ExecStart=/home/jules/questdb/questdb.sh start -n
                ExecStop=/home/jules/questdb/questdb.sh stop
                Restart=on-failure
                User=jules
                RestartSec=5
                WorkingDirectory=/home/jules/questdb
                
                [Install]
                WantedBy=multi-user.target

This is an adaption of the instructions to start questdb on a terminal, with addition of:

Setting JAVA_HOME environment variable
Addition of the -n parameter to the start command to ensure that it doesn't exit when the terminal disconnects.

Each of the service files quest01_readings, quest01_lights, quest01_programs needs the Unit section modifying as follows:

            
                [Unit]
                Description=QuestDB Readings Service
                After=questdb.service
                Wants=questdb.service

Power consumption

Power consumption with questdb, quest01_readings, quest01_lights, quest01_programs, server (idle) varies between 0.58A and 0.85A, with the average being 0.7A all @5.18V so an average 3.83w

Log files

By default questdb has collected about 16Gb of log files in 4 days. A first attempt to get control of this was made using logcache. To do this I installed logcache and configured it.

            
                sudo apt-get install logrotate

Logrotate needs the configuration file /etc/logrotate.d/questdb

            
                ~/.questdb/log/*.txt {
                    daily
                    rotate 2
                    compress
                    delaycompress
                    missingok
                    notifempty
                    copytruncatey
                    create 640 jules jules
                }

In order to take effect logrotate needs to be run, and to take effect daily it needs to be run as a cron job

HOWEVER questdb also has its own sophisticated configuration to control logs, which is probably better than logrotate. This is set up via ~/.questdb/conf/log.conf as follows:

            
                # list of configured writers
                writers=file,stdout,http.min
                #,alert
                
                # file writer
                #w.file.class=io.questdb.log.LogFileWriter
                #w.file.location=questdb-debug.log
                #w.file.level=INFO,ERROR
                
                # rolling file writer 
                w.file.class=io.questdb.log.LogRollingFileWriter
                w.file.location=${log.dir}/questdb-rolling.log.${date:yyyyMMdd}
                w.file.level=INFO,ERROR
                #rollEvery accepts: day, hour, minute, month
                w.file.rollEvery=day
                #rollSize specifies size at which to roll a new log file: a number followed by k, m, g (KB, MB, GB respectively)
                w.file.rollSize=128m
                #lifeDuration accepts: a number followed by s, m, h, d, w, M, y for seconds, minutes, hours, etc.  
                w.file.lifeDuration=1d
                #sizeLimit is the max fileSize of the log directory. Follows same format as rollSize
                w.file.sizeLimit=1g

                # stdout
                w.stdout.class=io.questdb.log.LogConsoleWriter
                w.stdout.level=ERROR
                
                # alert
                # w.alert.class=io.questdb.log.LogAlertSocketWriter
                # w.alert.level=CRITICAL
                # w.alert.location=/alert-manager-tpt.json
                # w.alert.alertTargets=localhost:9093,209.111.255.57:2468, ... ,localhost:9999
                # w.alert.inBufferSize=2M
                # w.alert.outBufferSize=4M
                # w.alert.reconnectDelay=250 # 1/4th sec (milli precision)
                # w.alert.defaultAlertHost=127.0.0.1
                # w.alert.defaultAlertPort=9093
                # disable logging out of min http server, which is supposed to be used
                # for frequent monitoring
                w.http.min.class=io.questdb.log.LogConsoleWriter
                w.http.min.level=ERROR
                w.http.min.scope=http-min-server

After restarting questdb, this now creates logs called ~/.questdb/log/questdb-rolling.log.20240910

This creates files of no more than 128Mb and a total of 1Gb altogether, so no need to run logrotate

Contents of log files

At the head of the log file from system start (head [logfile] --lines=100) there are some warnings:

A server-main fs.file-max checked [limit=524288]

A server-main vm.max_map_count limit is too low [limit=65530] (SYSTEM COULD BE UNSTABLE)

2024-09-10T08:52:55.238986Z A server-main make sure to increase fs.file-max and vm.max_map_count limits:

capacity-planning/#os-configutation
- fs.file-max - the maximum number of files that can be open
- vm.max_map_count controls the amount of memory allocated to a process. Whilst this can be increased it could affect interactive performance which is already poor. This might signal that this Pi shouldn't be loaded much more or that we need a bigger one. The downside is that if this limit is increased it applies to all programs. Tried increasing vm.max_map_count to 262144, but this didn't make any noticeable difference as viewed in htop.
Neither of these appear at present to be causing problems, this needs to be kept under observation.
snapshot root: [path=/home/jules/.questdb/snapshot, magic=0x00] -> UNSUPPORTED (SYSTEM COULD BE UNSTABLE)

Performance

After about 2 weeks running there is a noticeable deterioration in performance. It is proposed to resolve this by:

Replacing the readings table with several tables, each designed for corresponding source datasets
Summarising old data into half hourly for important readings:
- Grid consumption (powerwall site power) 260, 14
- Solar generatino (powerwall solar power) 263, 14
- Load (powerwall load power) 262, 14
- Battery power (powerwall battery power) 261, 14
- Carbon intensity - forecast and history
- Weather forecast
- octopus electricity price and consumption
- Battery SOC
- EVSE power 14
- Plugs
- Hot water temperature - could fit in with move to RS485 direct to Pi
- Thermostats - could fit in with move to RS485 direct to Pi
Deleting old partitions of data - more than ? days old
Stop processing ha_insert_packet S messages on the pi, store status messages from lights and alarms by board, by reporting individual boards with a summary of status record (just pins 1 to 9 probably) Put this in lightboard tables, there are about 26 light boards and 23 switch boards, switches don't report status info. So an update on a 1 second rota from report every light board once a second.
In order to produce the graphs used in the boards display, table and column names need to be added to boards.json these can then be used to generate appropriate SQL for the /readings query.

Half hourly energy data

Derived from power readings, noting that:
- Import and export need to be accumated separately based on sign of power
- Interpolation is needed when the signs of successive powers oppose
- Interpolation of readings is needed when readings are in different HH periods
It might be convenient to summarise data for a day and then delete that day a day later.

Combination tables to substantially reduce update traffic

The original readings table did an insert once per board/pin combination with quite high frequency. Stopping the quest01_readings service has dramatically reduced CPU load.

Instead of the readings table one of the main causes of traffic is current measurement for LightBoards, instead a light_status table has been implemented, one mqtt packet for all light boards is produced once a minute. With this implementation these is one row per board.

There are the following classes and tables

Class

Implementation

LightBoard

table	light_status
update script	quest01_light_status
Server	implemented

powerwall

table	update script	Server
powerwall_meters	quest01_powerwall_meters	implemented
powerwall_operation	quest01_powerwall_operation	implemented
powerwall_soe	quest01_powerwall_soe	implemented

SwitchBoard

table	switch_status
update script	quest01_switch_status
Server	implemented

ThermoCouple

HeatmiserV3Slave

table	thermostats
update script	quest01_thermostats
Server

Alarm

octopusrates

octopususage

solcast

table	update script	Server
solcast_forecast	quest01_solcast	implemented
solcast_history	quest01_solcast	implemented

There is now provision for more than one site, investigate with solcast how many arrays we can have - can we support east and west facing roofs as well?

Plug

Car

openevse

table	update script	Server
openevse	quest01_openevse

weather

table	update script	Server
weather	quest01_weather	partial

carbonintensity

batteryshunt

Have either RS485 to USB cable to connect to Pi, or possibly bluetooth

Victron IP22

Have either bluetooth or VEbus pins inside to connect to UART to USB converter

solarthermal

Investigate replacing RESOL BS/4 with one with VBus and get either

Conversation about BS/ not supporting VBus
Board to convert VBUS Voltage levels to UART 3.3V signals and connect to Pi or UART -> USB converter
Resol VBUS to USB or Ethernet
Now have RESOL controller installed

ctreader

Investigate fitting cheap board to measure current for up to 4 devices from SCT-013

mlabs2008
Need to investigate a suitable wemos esp32 or nodemcu board

Implemented circuit, which biases one leg of CT with a potential divider across 3.3v with an AC filter, so that DC is lifted to 1.65V, but anything over 3.8hz is sent to ground. The other leg of the CT is connected to an analog input of TM4C.

As of 29/11/2024 getting this needs to be debugged.

ftc6

New heatpump flow controller, with MODBUS interface installed. Modbus interface implemented on TM4C. See heating upgrade for further details

table	update script	Server
ftc_daily	quest01_ftc6	TODO
ftc_performance	quest01_ftc6	Implemented

For each class for the following need to be implemented:

table for readings
update script
systemctl service for each update script
mqtt publish for source
board configuration entries
by /readings service implementation

And optionally

Custom drawing code to draw graphics page
Detail page and other links
/subscribe subscribe to mqtt messages
/message collect mqtt messages subscribed to
/update send an mqtt update

Recovery

After nearly 3 months running several larger tables stopped updating. The first clue comes from selecting rows from these tables all these selections ended about 2024-11-25T15:45.

As the tables are partitioned by day list partitions:

            
                SHOW PARTITIIONS FROM {tablename}

This confirms that all these tables ended at about the same time.

Inspecting the logs confirms the cause of failure, first we list logs to see which one to look at:

            
jules@quest01:~/.questdb/log $ ls
questdb-rolling.log.20241128    stdout-2024-09-17T15-55-27.txt  stdout-2024-10-06T11-13-04.txt  stdout-2024-11-27T16-35-35.txt
questdb-rolling.log.20241129    stdout-2024-09-19T16-45-11.txt  stdout-2024-10-26T10-29-53.txt  stdout-2024-11-27T16-40-07.txt
stdout-2024-09-17T15-54-39.txt  stdout-2024-09-21T16-52-31.txt  stdout-2024-11-25T15-45-22.txt
stdout-2024-09-17T15-54-43.txt  stdout-2024-09-21T17-00-10.txt  stdout-2024-11-27T13-06-40.txt

The above listing shows that stdout-2024-11-25T15-45-22.txt seems to end at about the same time as the tables, so looking at the tail of it should show what happened.

            
jules@quest01:~/.questdb/log $ tail stdout-2024-11-25T15-45-22.txt
2024-11-25T15:45:35.754636Z A pg-server listening on 0.0.0.0:8812 [fd=111 backlog=64]
2024-11-25T15:45:35.761813Z A tcp-line-server listening on 0.0.0.0:9009 [fd=113 backlog=256]
2024-11-25T15:45:36.106464Z A server-main server is ready to be started
2024-11-25T15:45:36.210893Z A server-main enjoy
2024-11-25T15:45:36.565780Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=programs_direct~11, error=segment /home/jules/.questdb/db/programs_direct~11/wal5776/0/_event.i does not have txn with id 9, offset=611, indexFileSize=104, maxTxn=8, size=611, errno=0]
2024-11-25T15:45:36.567543Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=light_status~21, error=segment /home/jules/.questdb/db/light_status~21/wal9/4/_event.i does not have txn with id 4721, offset=316315, indexFileSize=37784, maxTxn=4720, size=316315, errno=0]
2024-11-25T15:45:36.691832Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=switch_status~24, error=segment /home/jules/.questdb/db/switch_status~24/wal4/5/_event.i does not have txn with id 100, offset=6708, indexFileSize=816, maxTxn=99, size=6708, errno=0]
2024-11-25T15:45:36.723872Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=openevse~27, error=segment /home/jules/.questdb/db/openevse~27/wal4/0/_event.i does not have txn with id 53466, offset=2673308, indexFileSize=427744, maxTxn=53465, size=2673308, errno=0]
2024-11-25T15:45:36.777000Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_meters~18, error=segment /home/jules/.questdb/db/powerwall_meters~18/wal7113/0/_event.i does not have txn with id 1403, offset=94009, indexFileSize=11248, maxTxn=1402, size=94009, errno=0]
2024-11-25T15:45:36.802191Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_soe~17, error=segment /home/jules/.questdb/db/powerwall_soe~17/wal1820/0/_event.i does not have txn with id 1719, offset=85958, indexFileSize=13776, maxTxn=1718, size=85958, errno=0]

Here we see that 6 tables have been suspended, because of a missing id.

Clearly I should have implemented a recovery strategy earlier!

Backup strategy

The enterprise version of questdb lets you checkpoint the database which stops updates, and then make a copy. Something similar could be achieved by stopping the database, copying it and starting it again.

In practice the data can be extracted on a daily basis by extracting a previous day's updates using a script to select all rows from each table for a day and save the result to a json file. This script is extract_day.js

extract_day.js

This is in ~/Projects/quest01/scripts , it is run as follows:

            
                node ~/Projects/quest01/scripts/extract_day.js [options]

[options]:

date=yyyy-mm-dd , where yyyy-mm-dd is a single day
start=yyyy-mm-dd end=yyyy-mm-dd
yesterday

For each day a folder is created in ~/Projects/quest01/data , which contains each of the extracted tables.

Cron job

A cron job has been created on the desktop which runs 5 minutes after boot, the entry in crontab -e is as follows:

            
                @reboot /usr/bin/sleep 300 && /usr/bin/node /home/jules/Projects/quest01/scripts/extract_day.js yesterday

Repairing tables

Without further action suspended tables cannot be written to, the procedure for repairing tables in this case at least is to drop the broken partition, and possibly rebuild it. Dropping the partition is done as follows:

For each affected table change its type

                    
                        ALTER TABLE {tablename} SET TYPE BYPASS WAL

Stop questdb

                    
                        ~/questdb/questdb.sh stop

Start questdb - which makes the change take effect

                    
                        ~/questdb/questdb.sh start

Drop all the affected partitions

                    
                        ALTER TABLE {tablename} DROP PARTITION LIST '2024-11-25'

Or whatever date is identified as faulty

Change all affected tables back to WAL

                    
                        ALTER TABLE {tablename} SET TYPE WAL

Stop and restart questdb

After which you may restore the missing data.

Potential recovery improvements:

Implement import_day.js as which imports a day's data from a table in ~/Projects/quest01/data This needs options:
Keep a duplicate database on another machine, which has partitions deleted for days that have been successfully backed up to the desktop.