-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDB$BLOB_UTIL system package. #281
Changes from 11 commits
9cff3dd
9d8a404
8ad7850
081469f
cfcbaaf
9886d64
eb6e19e
e18bd80
84c4651
ec8e0ae
8d7e53e
c8f5070
e0cde91
22dcbea
c31a7da
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
# `RDB$BLOB_UTIL` package (FB 5.0) | ||
|
||
This package exists to manipulate BLOBs in a way that standard Firebird functions, like `BLOB_APPEND` and `SUBSTRING` cannot do it or is very slow. | ||
|
||
These routines operates on binary data directly, even for text BLOBs. | ||
|
||
## Function `NEW_BLOB` | ||
|
||
`RDB$BLOB_UTIL.NEW_BLOB` is used to create a new BLOB. It returns a BLOB suitable for data appending, like `BLOB_APPEND` does. | ||
|
||
The advantage over `BLOB_APPEND` is that it's possible to set custom `SEGMENTED` and `TEMP_STORAGE` options. | ||
|
||
`BLOB_APPEND` always creates BLOB in temporary storage. That may not be the best approach if the created BLOB is going to be stored in a permanent table, as it will require copy. | ||
|
||
Returned BLOB from this function, even when `TEMP_STORAGE = FALSE` may be used with `BLOB_APPEND` for appending data. | ||
|
||
Input parameter: | ||
- `SEGMENTED` type `BOOLEAN NOT NULL` | ||
- `TEMP_STORAGE` type `BOOLEAN NOT NULL` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should be prepared for the tablespaces feature, so that blobs could be created in the explicitly specified (by name) tablespace which can be either "permanent" or "temporary". This requires more thinking. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A new parameter with default. Also named arguments as Oracle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know how the same problem is going to be resolved in regard to storage specified in BPB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the lower-level options ( |
||
|
||
Return type: `BLOB NOT NULL`. | ||
|
||
## Function `OPEN_BLOB` | ||
|
||
`RDB$BLOB_UTIL.OPEN_BLOB` is used to open an existing BLOB for read. It returns a handle (an integer bound to the transaction) suitable for use with others functions of this package, like `SEEK`, `READ_DATA` and `CLOSE_HANDLE`. | ||
|
||
Input parameter: | ||
- `BLOB` type `BLOB NOT NULL` | ||
|
||
Return type: `INTEGER NOT NULL`. | ||
|
||
## Function `IS_WRITABLE` | ||
|
||
`RDB$BLOB_UTIL.IS_WRITABLE` returns `TRUE` when BLOB is suitable for data appending without copying using `BLOB_APPEND`. | ||
|
||
Input parameter: | ||
- `BLOB` type `BLOB NOT NULL` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe we need yet another routine -- something like APPEND_BLOB -- to concatenate the whole other blob if it's longer than 32KB. Here "BLOB" in the name again seems redundant ;-) so better naming ideas are welcome. Or we should find a way to make APPEND polymorphic in regard in its input. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we could create a VARIANT type which could be used as system routines arguments - and also as general data type. System functions already can work in this way, but they do not have stored metadata. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While the |
||
Return type: `BOOLEAN NOT NULL`. | ||
|
||
## Function `READ_DATA` | ||
|
||
`RDB$BLOB_UTIL.READ_DATA` is used to read chunks of data of a BLOB handle opened with `RDB$BLOB_UTIL.OPEN_BLOB`. When the BLOB is fully read and there is no more data, it returns `NULL`. | ||
|
||
If `LENGTH` is passed with a positive number, it returns a VARBINARY with its maximum length. | ||
|
||
If `LENGTH` is `NULL` it returns just a segment of the BLOB with a maximum length of 32765. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC, blob segments may be up to 64KB in length. Will the longer-than-32KB segment be truncated to 32KB and the next READ call would return the remaining part? Can there be any consequences if the segment is split to multiple parts? For example, one source segment will be written as two segments in the target blob. Blob filters may be not able to decode half-chunks properly (firstly it's about built-in transliteration filters -- think about splitting in the middle of a multi-byte character -- although perhaps they never deal with chunks longer then 32KB). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could increase max VARCHAR to 64KB - 2. But also, is there an impediment to have max dsc_length of dtype_varying to 64KB? As I understand, there should not be many places just reading and incrementing dsc_length, so dtype_cstring / dtype_varying does not need to have the constant size added to dsc_length. It would simplify various places that substract and re-add that value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, the segment split problem remains anyway, if the |
||
Input parameters: | ||
- `HANDLE` type `INTEGER NOT NULL` | ||
- `LENGTH` type `INTEGER` | ||
|
||
Return type: `VARBINARY(32767)`. | ||
|
||
## Function `SEEK` | ||
|
||
`RDB$BLOB_UTIL.SEEK` is used to set the position for the next `READ_DATA`. It returns the new position. | ||
|
||
`MODE` may be 0 (from the start), 1 (from current position) or 2 (from end). | ||
|
||
When `MODE` is 2, `OFFSET` should be zero or negative. | ||
|
||
Input parameter: | ||
- `HANDLE` type `INTEGER NOT NULL` | ||
- `MODE` type `INTEGER NOT NULL` | ||
- `OFFSET` type `INTEGER NOT NULL` | ||
|
||
Return type: `INTEGER NOT NULL`. | ||
|
||
## Procedure `CANCEL_BLOB` | ||
|
||
`RDB$BLOB_UTIL.CANCEL_BLOB` is used to immediately release a temporary BLOB, like one created with `BLOB_APPEND`. | ||
|
||
Note that if the same BLOB is used after cancel, using the same variable or another one with the same BLOB id reference, invalid blob id error will be raised. | ||
|
||
## Procedure `CLOSE_HANDLE` | ||
|
||
`RDB$BLOB_UTIL.CLOSE_HANDLE` is used to close a BLOB handle opened with `RDB$BLOB_UTIL.OPEN_BLOB`. | ||
|
||
Not closed handles are closed automatically only in the transaction end. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With parameters being BLOB rather than INTEGER handle, I'd just call it "CLOSE". And maybe think about "auto-close" scenarios in some cases. |
||
Input parameter: | ||
- `HANDLE` type `INTEGER NOT NULL` | ||
|
||
# Examples | ||
|
||
- Example 1: Create a BLOB in temporary space and return it in `EXECUTE BLOCK`: | ||
|
||
``` | ||
execute block returns (b blob) | ||
as | ||
begin | ||
-- Create a BLOB handle in the temporary space. | ||
b = rdb$blob_util.new_blob(false, true); | ||
|
||
-- Add chunks of data. | ||
b = blob_append(b, '12345'); | ||
b = blob_append(b, '67'); | ||
|
||
suspend; | ||
end | ||
``` | ||
|
||
- Example 2: Open a BLOB and return chunks of it with `EXECUTE BLOCK`: | ||
|
||
``` | ||
execute block returns (s varchar(10)) | ||
as | ||
declare b blob = '1234567'; | ||
declare bhandle integer; | ||
begin | ||
-- Open the BLOB and get a BLOB handle. | ||
bhandle = rdb$blob_util.open_blob(b); | ||
|
||
-- Get chunks of data as string and return. | ||
|
||
s = rdb$blob_util.read_data(bhandle, 3); | ||
suspend; | ||
|
||
s = rdb$blob_util.read_data(bhandle, 3); | ||
suspend; | ||
|
||
s = rdb$blob_util.read_data(bhandle, 3); | ||
suspend; | ||
|
||
-- Here EOF is found, so it returns NULL. | ||
s = rdb$blob_util.read_data(bhandle, 3); | ||
suspend; | ||
|
||
-- Close the BLOB handle. | ||
execute procedure rdb$blob_util.close_handle(bhandle); | ||
end | ||
``` | ||
|
||
- Example 3: Seek in a blob. | ||
|
||
``` | ||
create table t(b blob); | ||
|
||
set term !; | ||
|
||
execute block returns (s varchar(10)) | ||
as | ||
declare b blob; | ||
begin | ||
-- Create a stream BLOB handle. | ||
b = rdb$blob_util.new_blob(false, true); | ||
|
||
-- Add data. | ||
b = blob_append(b, '0123456789'); | ||
|
||
-- Materialize the BLOB. | ||
insert into t (b) values (:b); | ||
|
||
-- Open the BLOB. | ||
b = rdb$blob_util.open_blob(b); | ||
|
||
-- Seek to 5 since the start. | ||
rdb$blob_util.seek(b, 0, 5); | ||
s = rdb$blob_util.read_data(b, 3); | ||
suspend; | ||
|
||
-- Seek to 2 since the start. | ||
rdb$blob_util.seek(b, 0, 2); | ||
s = rdb$blob_util.read_data(b, 3); | ||
suspend; | ||
|
||
-- Advance 2. | ||
rdb$blob_util.seek(b, 1, 2); | ||
s = rdb$blob_util.read_data(b, 3); | ||
suspend; | ||
|
||
-- Seek to -1 since the end. | ||
rdb$blob_util.seek(b, 2, -1); | ||
s = rdb$blob_util.read_data(b, 3); | ||
suspend; | ||
end! | ||
|
||
set term ;! | ||
``` | ||
|
||
- Example 4: Check if blobs are writable: | ||
|
||
``` | ||
create table t(b blob); | ||
|
||
set term !; | ||
|
||
execute block returns (bool boolean) | ||
as | ||
declare b blob; | ||
begin | ||
b = blob_append(null, 'writable'); | ||
bool = rdb$blob_util.is_writable(b); | ||
suspend; | ||
|
||
insert into t (b) values ('not writable') returning b into b; | ||
bool = rdb$blob_util.is_writable(b); | ||
suspend; | ||
end! | ||
|
||
set term ;! | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need such an artificial for SQL concept as "handle" here? Every blob is represented with blob ID which actually is a handle. Passing a blob here and there inside PSQL (except assigning to a table field) is just a matter of copying its ID, the contents is not touched. So tra_blob_util_map may just store blob IDs created/opened with RDB$BLOB_UTIL package. And all package functions may declare inputs/outputs as just BLOB instead of INTEGER handle. Do I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blob id is used in the client with a handle. A handle in this context is an id more the blb class inside the engine. A blb has information like current position. RDB$BLOB_UTIL handles model this concept in PSQL.
A blob id for this would be very confusing. Many different variables would have the same id so how one could have multiple parallel seek/read in the same blob id?
Also a blob id is implicitely copied depending on blob charset when they are passed as arguments.