The reason for considering this is to enable Iot data to be captured on a raspberry pi, with consolidation to desktop.
If successful the existing mongodb database would be migrated on the desktop
Some criteria that should be matched:
Assume installation using docker (at least to start with)
sudo docker run -p 9000:9000 -p 8812:8812 questdb/questdb
Basic information
Web Console | Web Console | |
Web Console(remote) | Web Console (Desktop) | |
Postgre compatible port | :8812 | |
File storage | /var/lib/questdb/conf | Is this where data is stored? |
Does it need to stay resident - when run it is online?
docker run -d -p 9000:9000 -p 8812:8812 --name questdb questdb/questdb
CREATE TABLE packet_data (
timestamp TIMESTAMP,
board SYMBOL,
pin SYMBOL,
reading DOUBLE
) timestamp(timestamp) PARTITION BY DAY;
This could either be input via the console or
PARTITION BY DAY means that the table can be updated out of sequence
Wrote a script to capture ha_insert_packet mqtt packets from the network, this has a number of options:
This means that the pi must be able to store at least 35 packets/sec
Now need another script to transfer data from mongodb.
Notice that several CPUs are at 4.8GHz and i7 power consumption up to 90W whilst inserting packets into database
The database in mongodb is stored in:
database | ha_log |
collection | logcache2 |
The structure of each document in logcache2 is as follows:
{
board: 101 , // board id for document
pin: 1, // pin number
pstart: ???, // start timestamp datetime
pend: ???, // end timestamp datetime
result: [
T: ???, // timestamp for reading from d.getTime()
R: ??? // reading in suitable integer units (milliamps, 1/10 degree, etc)
]
} // each document contains results where T lies between pstart and pend
Wrote script to load ha_log/logcache2 data into questdb by date range load_from_mongodb.js Now loaded a couple of days data, and observed the following
Start date | End Date | Packets | Packets/sec | size(MB) | Elapsed (m:s) |
---|---|---|---|---|---|
2024-09-02T23:00 | 2024-09-04T19:59 | 697222 | 357 | ? | ? |
2024-09-01T23:00 | 2024-09-02T23:00 | 458231 | 360 | 260 | 21:23 |
2024-08-20T23:00 | 2024-09-01T23:00 | 5411492 | 331 | 1103.984 | 272:40 |
Summary of results
Total elapsed timw | 294m 03s |
Packets loaded | 5411492 |
Storage required | 1103.982Mb |
Elapsed time represented | 14 days 20:39 (14.86days) |
Approx storage required for a day | 75Mb |
Approx storage required for a year | 27Gb |
Approx storage for 6 years | 163Gb |
Elapsed time to load 1 year | 112 hours |
Elapsed time to load 6 years | 670 hours |
The data occupies about 35Gb when stored in MongoDB, so at present there aren't enough reasons to justify doing this.
It is clear from this loading exercise that the limit on loading rate is the rate of receipt of packets, rather than writing them to the database.
BUT some substantial improvements can be made by choice of the most appropriate data types:
Column | Current Size | New Type | New Size |
---|---|---|---|
timestamp | 8 | timestamp | 8 |
board | 4 | SYMBOL | 4 |
pin | 4 | SHORT | 2 |
reading | 8 | SHORT | 2 |
TOTAL | 24 | 14 |
reading as SHORT is from the udp packet layout, mqtt packets could have readings that require more storage
making only board a symbol drops the index on pin, which will save space
UDP packets are delivered from up to 2 ethernet connected boards at each position. These UDP packets are fixed format as follows:
Item | Start byte | End Byte | Possible values | Description |
---|---|---|---|---|
Command | 1 | 1 | D,N,Y,T,L,S
DHYT apply to programs
L controls individual lights typically DNYT commands apply a number of L commands S commands report a reading from a Source, Pin combination |
Command which determines action and the remainder of this packet |
Target | 2 | 2 | 0 to 255 | Intended recipient of packet |
Program | 3 | 3 | 0 to 255 | Used by Y,N,T,D commands to control a set of things that are grouped together (typically lights) |
Source | 4 | 4 | 0 to 255 | Board or device that sourced the packet |
Reason | 5 | 5 | Code | Reason code (see source for interpretation) |
Pin | 6 | 6 | 0-255 | Pin on board |
Level | 7 | 7 | 0-255 | Level applied to program or pin |
Reading | 9 | 10 | 16 bit integer | Reading encoded as a value in suitable units |
UDP packets are translated to/from mqtt by the server
The mqtt packet structure is as follows:
{
command: ,
target: ,
program: ,
source: ,
reason: ,
pin: ,
level: ,
reading:
}
The majority of packets report status via S commands, historically these have been stored in mongodb historically
Not all S type packets are stored, there is an array in ha_mongopacket.js , which resticts the packets stored. If this restriction isn't repeated packets from other sources would also be stored.
The remaining commands D,N,Y,T,L are stored in a mysql database, which to date has grown to 2.5Gb
Rather than using SQL insert statements questdb supports direct ingestion using its nodejs api rather than pg. this reduces server load significantly (140W down to 90W, so a 50W reduction in processor power requirement) and probably increases packets/sec.
It is becoming apparent that the j4105 server could be replaced by one or more Raspberry Pi
Functions to store a full database of 6 or more years readings could be online part of the time, so could be hosted on the desktop.
At the front end all packets should be stored and retained for a short period (1 week?), as data reaches a week old that day's partition should be dropped from the frontend once they have been consolidated on the desktop.
This article Create an IoT server with QuestDB and a Raspberry Pi describes how to install QuestDB on a Pi.
I have installed on a Pi 4 with a new 32Db SD Card. Using the following details:
Item | Value |
---|---|
Operating System | Raspberry Pi OS (bookworm) Lite 64 Bit |
Name | Quest01 |
Raspberry Pi Device | Raspberry Pi 4 |
SSH | Enable SSH |
SSH - Option | User Name and Password |
User name and password | jules with usual password |
Wifi Network | stringeroakhouse with usual password |
Follow the instructions in the article to install QuestDB 7.3 on the pi
Console Interface for QuestDB is then here QuestDB
Next we need to define some tables, if we are to record all UDP packets, then we need:
readings
CREATE TABLE readings (
timestamp TIMESTAMP,
board SYMBOL,
pin SHORT,
reading DOUBLE
) timestamp(timestamp) PARTITION BY DAY;
CREATE TABLE programs (
timestamp TIMESTAMP,
command SYMBOL,
program SHORT,
level BYTE,
target BYTE,
source BYTE,
reason BYTE
) timestamp(timestamp) PARTITION BY DAY;
CREATE TABLE lights (
timestamp TIMESTAMP,
board SYMBOL,
pin SHORT,
level BYTE,
source BYTE,
reason BYTE
) timestamp(timestamp) PARTITION BY DAY;
The Pi needs nodejs installed Install nodejs on raspberry Pi
It also needs some node_modules
And quest01_packet_inserter.js
Should tables now be just for their original purpose now they are easier to use, e.g.
This would enable:
On the Pi the database is stored in ~/.questdb , so the size of the database can be determined by:
jules@quest01:~ $ du -h ~/.questdb/db
Tables on the Pi will need to be have the oldest data removed periodically, as tables are partitioned by day, data can be deleted efficiently with an ALTER TABLE DROP PARTITION query.
The following retention times are suggested for different tables
Modified versions of the scripts to load these from mqtt were written, some issues were found with these:
In order to auto start on boot the following a service file is needed as follows:
[Unit]
Description=QuestDB Service
After=network.target
[Service]
Environment="JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64"
ExecStart=/home/jules/questdb/questdb.sh start -n
ExecStop=/home/jules/questdb/questdb.sh stop
Restart=on-failure
User=jules
RestartSec=5
WorkingDirectory=/home/jules/questdb
[Install]
WantedBy=multi-user.target
This is an adaption of the instructions to start questdb on a terminal, with addition of:
Each of the service files quest01_readings, quest01_lights, quest01_programs needs the Unit section modifying as follows:
[Unit]
Description=QuestDB Readings Service
After=questdb.service
Wants=questdb.service
Power consumption with questdb, quest01_readings, quest01_lights, quest01_programs, server (idle) varies between 0.58A and 0.85A, with the average being 0.7A all @5.18V so an average 3.83w
By default questdb has collected about 16Gb of log files in 4 days. A first attempt to get control of this was made using logcache. To do this I installed logcache and configured it.
sudo apt-get install logrotate
Logrotate needs the configuration file /etc/logrotate.d/questdb
~/.questdb/log/*.txt {
daily
rotate 2
compress
delaycompress
missingok
notifempty
copytruncatey
create 640 jules jules
}
In order to take effect logrotate needs to be run, and to take effect daily it needs to be run as a cron job
HOWEVER questdb also has its own sophisticated configuration to control logs, which is probably better than logrotate. This is set up via ~/.questdb/conf/log.conf as follows:
# list of configured writers
writers=file,stdout,http.min
#,alert
# file writer
#w.file.class=io.questdb.log.LogFileWriter
#w.file.location=questdb-debug.log
#w.file.level=INFO,ERROR
# rolling file writer
w.file.class=io.questdb.log.LogRollingFileWriter
w.file.location=${log.dir}/questdb-rolling.log.${date:yyyyMMdd}
w.file.level=INFO,ERROR
#rollEvery accepts: day, hour, minute, month
w.file.rollEvery=day
#rollSize specifies size at which to roll a new log file: a number followed by k, m, g (KB, MB, GB respectively)
w.file.rollSize=128m
#lifeDuration accepts: a number followed by s, m, h, d, w, M, y for seconds, minutes, hours, etc.
w.file.lifeDuration=1d
#sizeLimit is the max fileSize of the log directory. Follows same format as rollSize
w.file.sizeLimit=1g
# stdout
w.stdout.class=io.questdb.log.LogConsoleWriter
w.stdout.level=ERROR
# alert
# w.alert.class=io.questdb.log.LogAlertSocketWriter
# w.alert.level=CRITICAL
# w.alert.location=/alert-manager-tpt.json
# w.alert.alertTargets=localhost:9093,209.111.255.57:2468, ... ,localhost:9999
# w.alert.inBufferSize=2M
# w.alert.outBufferSize=4M
# w.alert.reconnectDelay=250 # 1/4th sec (milli precision)
# w.alert.defaultAlertHost=127.0.0.1
# w.alert.defaultAlertPort=9093
# disable logging out of min http server, which is supposed to be used
# for frequent monitoring
w.http.min.class=io.questdb.log.LogConsoleWriter
w.http.min.level=ERROR
w.http.min.scope=http-min-server
After restarting questdb, this now creates logs called ~/.questdb/log/questdb-rolling.log.20240910
This creates files of no more than 128Mb and a total of 1Gb altogether, so no need to run logrotate
At the head of the log file from system start (head [logfile] --lines=100) there are some warnings:
A server-main fs.file-max checked [limit=524288]
A server-main vm.max_map_count limit is too low [limit=65530] (SYSTEM COULD BE UNSTABLE)
2024-09-10T08:52:55.238986Z A server-main make sure to increase fs.file-max and vm.max_map_count limits:
capacity-planning/#os-configutation
Neither of these appear at present to be causing problems, this needs to be kept under observation.
snapshot root: [path=/home/jules/.questdb/snapshot, magic=0x00] -> UNSUPPORTED (SYSTEM COULD BE UNSTABLE)
After about 2 weeks running there is a noticeable deterioration in performance. It is proposed to resolve this by:
The original readings table did an insert once per board/pin combination with quite high frequency. Stopping the quest01_readings service has dramatically reduced CPU load.
Instead of the readings table one of the main causes of traffic is current measurement for LightBoards, instead a light_status table has been implemented, one mqtt packet for all light boards is produced once a minute. With this implementation these is one row per board.
There are the following classes and tables
Class | Implementation | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LightBoard |
|
||||||||||||
powerwall |
|
||||||||||||
SwitchBoard |
|
||||||||||||
ThermoCouple | |||||||||||||
HeatmiserV3Slave |
|
||||||||||||
Alarm | |||||||||||||
octopusrates | |||||||||||||
octopususage | |||||||||||||
solcast |
There is now provision for more than one site, investigate with solcast how many arrays we can have - can we support east and west facing roofs as well? |
||||||||||||
Plug | |||||||||||||
Car | |||||||||||||
openevse |
|
||||||||||||
weather |
|
||||||||||||
carbonintensity | |||||||||||||
batteryshunt | Have either RS485 to USB cable to connect to Pi, or possibly bluetooth |
||||||||||||
Victron IP22 | Have either bluetooth or VEbus pins inside to connect to UART to USB converter |
||||||||||||
solarthermal |
Investigate replacing RESOL BS/4 with one with VBus and get either
|
||||||||||||
ctreader | Investigate fitting cheap board to measure current for up to 4 devices from SCT-013
Implemented circuit, which biases one leg of CT with a potential divider across 3.3v with an AC filter, so that DC is lifted to 1.65V, but anything over 3.8hz is sent to ground. The other leg of the CT is connected to an analog input of TM4C. As of 29/11/2024 getting this needs to be debugged. |
||||||||||||
ftc6 |
New heatpump flow controller, with MODBUS interface installed. Modbus interface implemented on TM4C. See heating upgrade for further details |
table | update script | Server |
---|---|---|
ftc_daily | quest01_ftc6 | TODO |
ftc_performance | quest01_ftc6 | Implemented |
For each class for the following need to be implemented:
And optionally
After nearly 3 months running several larger tables stopped updating. The first clue comes from selecting rows from these tables all these selections ended about 2024-11-25T15:45.
As the tables are partitioned by day list partitions:
SHOW PARTITIIONS FROM {tablename}
This confirms that all these tables ended at about the same time.
Inspecting the logs confirms the cause of failure, first we list logs to see which one to look at:
jules@quest01:~/.questdb/log $ ls
questdb-rolling.log.20241128 stdout-2024-09-17T15-55-27.txt stdout-2024-10-06T11-13-04.txt stdout-2024-11-27T16-35-35.txt
questdb-rolling.log.20241129 stdout-2024-09-19T16-45-11.txt stdout-2024-10-26T10-29-53.txt stdout-2024-11-27T16-40-07.txt
stdout-2024-09-17T15-54-39.txt stdout-2024-09-21T16-52-31.txt stdout-2024-11-25T15-45-22.txt
stdout-2024-09-17T15-54-43.txt stdout-2024-09-21T17-00-10.txt stdout-2024-11-27T13-06-40.txt
The above listing shows that stdout-2024-11-25T15-45-22.txt seems to end at about the same time as the tables, so looking at the tail of it should show what happened.
jules@quest01:~/.questdb/log $ tail stdout-2024-11-25T15-45-22.txt
2024-11-25T15:45:35.754636Z A pg-server listening on 0.0.0.0:8812 [fd=111 backlog=64]
2024-11-25T15:45:35.761813Z A tcp-line-server listening on 0.0.0.0:9009 [fd=113 backlog=256]
2024-11-25T15:45:36.106464Z A server-main server is ready to be started
2024-11-25T15:45:36.210893Z A server-main enjoy
2024-11-25T15:45:36.565780Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=programs_direct~11, error=segment /home/jules/.questdb/db/programs_direct~11/wal5776/0/_event.i does not have txn with id 9, offset=611, indexFileSize=104, maxTxn=8, size=611, errno=0]
2024-11-25T15:45:36.567543Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=light_status~21, error=segment /home/jules/.questdb/db/light_status~21/wal9/4/_event.i does not have txn with id 4721, offset=316315, indexFileSize=37784, maxTxn=4720, size=316315, errno=0]
2024-11-25T15:45:36.691832Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=switch_status~24, error=segment /home/jules/.questdb/db/switch_status~24/wal4/5/_event.i does not have txn with id 100, offset=6708, indexFileSize=816, maxTxn=99, size=6708, errno=0]
2024-11-25T15:45:36.723872Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=openevse~27, error=segment /home/jules/.questdb/db/openevse~27/wal4/0/_event.i does not have txn with id 53466, offset=2673308, indexFileSize=427744, maxTxn=53465, size=2673308, errno=0]
2024-11-25T15:45:36.777000Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_meters~18, error=segment /home/jules/.questdb/db/powerwall_meters~18/wal7113/0/_event.i does not have txn with id 1403, offset=94009, indexFileSize=11248, maxTxn=1402, size=94009, errno=0]
2024-11-25T15:45:36.802191Z C i.q.c.w.ApplyWal2TableJob job failed, table suspended [table=powerwall_soe~17, error=segment /home/jules/.questdb/db/powerwall_soe~17/wal1820/0/_event.i does not have txn with id 1719, offset=85958, indexFileSize=13776, maxTxn=1718, size=85958, errno=0]
Here we see that 6 tables have been suspended, because of a missing id.
Clearly I should have implemented a recovery strategy earlier!
The enterprise version of questdb lets you checkpoint the database which stops updates, and then make a copy. Something similar could be achieved by stopping the database, copying it and starting it again.
In practice the data can be extracted on a daily basis by extracting a previous day's updates using a script to select all rows from each table for a day and save the result to a json file. This script is extract_day.js
This is in ~/Projects/quest01/scripts , it is run as follows:
node ~/Projects/quest01/scripts/extract_day.js [options]
[options]:
For each day a folder is created in ~/Projects/quest01/data , which contains each of the extracted tables.
A cron job has been created on the desktop which runs 5 minutes after boot, the entry in crontab -e is as follows:
@reboot /usr/bin/sleep 300 && /usr/bin/node /home/jules/Projects/quest01/scripts/extract_day.js yesterday
Without further action suspended tables cannot be written to, the procedure for repairing tables in this case at least is to drop the broken partition, and possibly rebuild it. Dropping the partition is done as follows:
ALTER TABLE {tablename} SET TYPE BYPASS WAL
~/questdb/questdb.sh stop
~/questdb/questdb.sh start
ALTER TABLE {tablename} DROP PARTITION LIST '2024-11-25'
Or whatever date is identified as faulty
ALTER TABLE {tablename} SET TYPE WAL
After which you may restore the missing data.