Asynchronous JSON conversion¶
Normally, newt converts data to JSON as it’s saved in the database. In turn, any indexes defined on the JSON data are updated at the same time. With the default JSON index, informal tests show Newt DB write performance to be around 10 percent slower than RelStorage. Adding a text index brought the performance down to the point where writes took twice as long, but were still fairly fast, several hundred per second on a laptop.
If you have a lot of indexes to update or write performance is critical, you may want to leverage Newt DB’s ability to update the JSON data asynchronously. Doing so, allows the primary transactions to execute more quickly.
Updating indexes asynchronously will usually be more efficient, because Newt DB’s asynchronous updater batches updates. When indexes are updated for many objects in the same transaction, less data has to be written per transaction.
If you want to try New DB, with an existing RelStorage/PostgreSQL database, you can use the updater to populate Newt DB without changing your application and introduce the use of Newt DB’s search API gradually.
There are some caveats however:
- Because updates are asynchronous, search results may not always reflect the current data.
- Packing requires some special care, as will be discussed below.
- You’ll need to run a separate daemon,
newt-updater
in addition to your database server.
Contents
Using Newt’s Asynchronous Updater¶
To use Newt’s asynchronous updater:
Omit
newt
tag from your database configuration, as in:%import newt.db <newtdb foo> <zodb> <relstorage> keep-history false <postgresql> dsn postgresql://localhost/mydb </postgresql> </relstorage> </zodb> </newtdb>
Run the
newt-updater
script:newt-updater postgresql://localhost/mydb
You’ll want to run this using a daemonizer like supervisord or ZDaemon.
newt-updater
has a number of options:
-l, --logging-configuration | |
Logging configuration. This can be a log level, like | |
-g, --gc-only | Collect garbage and exit. This removes Newt DB records that don’t have corresponding database records. This is done by executing: delete from newt n where not exists (
select from object_state s where n.zoid = s.zoid)
Note that garbage collection is normally performed on startup unless the -G option is used. |
-G, --no-gc | Don’t perform garbage collection on startup. |
--nagios | Check the status of the updater. The status is checked by checking the updater lag, which is the difference between the last transaction committed to the database, and the last transaction processed by the updater. The option takes 2 numbers, separated by commas. The first number is the lag, in seconds, for the updater to be considered to be OK. The second number is the maximum lag for which the updater isn’t considered to be in error. For example, 1,99 indicates OK if 1 or less, WARNING if more than 1 and less than or equal to 99 and ERROR of more than 99 seconds. |
-t, --poll-timeout | |
Specify a poll timeout, in seconds. Normally, the updater is notified to poll for changes. If it doesn’t get notified in poll-timeout seconds, it will poll anyway. This is a backstop to PostgreSQL’s notification. The default timeout is 300 seconds. | |
-m, --transaction-size-limit | |
The target transaction batch size. This limits (loosely) the number of records processed in a batch. Larger batches incur less overhead, but long-lasting transactions can cause interfere with other processing. The default is 100 thousand records. This option only comes into play when a large number of records have
to be processed, typically when first running the updater or using
the | |
-T, --remove-delete-trigger | |
Remove the Newt DB delete trigger, if it exists. The Newt DB delete trigger is incompatible with the updater. It can cause deadlock errors is packed while the updater is running. This option is needed if you set up Newt DB normally, and then decided that you wanted update Newt DB asynchronously. | |
-d, --driver | Provide an explicit Postgres driver name (psycopg2 or psycopg2cffi). By default, the appropriate driver will be selected automatically. |
--compute-missing | |
Compute missing newt records. Rather than processing new records, process records written up through the current time and stop. Only missing records are updated. This option requires PostgreSQL 9.5 or later. This is used to compute newt records after adding Newt DB to an existing PostgreSQL RelStorage application. |
Garbage collection¶
The asynchronous updater tracks new database inserts and updates.
When a database is packed, records are
removed without generating updates. Those deletes won’t be reflected
in the Newt DB. You can tell the updater to clean up Newt DB records
for which there are no-longer database records by either restarting
it, or running it with the -g
option:
newt-updater -g postgresql://localhost/mydb
This tells the updater to just collect garbage. You’ll probably want to run this right after running zodbpack.
Monitoring¶
When running an external updater, like newt-updater
, you’ll want
to have some way to monitor that it’s working correctly. The
--nagios
option newt-updater
script can be used to provide a
Nagios Plugin:
newt-updater postgresql://localhost/mydb --nagios 3,99
The argument to the --nagios
option is a pair of numbers giving
limits for OK and warning alerts. They’re based on how far behind the
updater is. For example, with the example above, the monitor
considers the updater to be OK if it is 3 seconds behind or less, in
error if it is more than 99 seconds behind and of concern otherwise.
Any monitoring system compatible with the Nagios plugin API can be used.
The monitor output includes the lag, how far behind the updater is, in seconds as a performance metric.