Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDB$BLOB_UTIL system package. #281

Merged
merged 15 commits into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions builds/win32/msvc15/engine_static.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
<ClCompile Include="..\..\..\src\jrd\Attachment.cpp" />
<ClCompile Include="..\..\..\src\jrd\blb.cpp" />
<ClCompile Include="..\..\..\src\jrd\blob_filter.cpp" />
<ClCompile Include="..\..\..\src\jrd\BlobUtil.cpp" />
<ClCompile Include="..\..\..\src\jrd\btn.cpp" />
<ClCompile Include="..\..\..\src\jrd\btr.cpp" />
<ClCompile Include="..\..\..\src\jrd\builtin.cpp" />
Expand Down Expand Up @@ -220,6 +221,7 @@
<ClInclude Include="..\..\..\src\jrd\blb_proto.h" />
<ClInclude Include="..\..\..\src\jrd\blf_proto.h" />
<ClInclude Include="..\..\..\src\jrd\blob_filter.h" />
<ClInclude Include="..\..\..\src\jrd\BlobUtil.h" />
<ClInclude Include="..\..\..\src\jrd\blp.h" />
<ClInclude Include="..\..\..\src\jrd\blr.h" />
<ClInclude Include="..\..\..\src\jrd\btn.h" />
Expand Down
6 changes: 6 additions & 0 deletions builds/win32/msvc15/engine_static.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,9 @@
<ClCompile Include="..\..\..\src\jrd\blob_filter.cpp">
<Filter>JRD files</Filter>
</ClCompile>
<ClCompile Include="..\..\..\src\jrd\BlobUtil.cpp">
<Filter>JRD files</Filter>
</ClCompile>
<ClCompile Include="..\..\..\src\jrd\btn.cpp">
<Filter>JRD files</Filter>
</ClCompile>
Expand Down Expand Up @@ -680,6 +683,9 @@
<ClInclude Include="..\..\..\src\jrd\blob_filter.h">
<Filter>Header files</Filter>
</ClInclude>
<ClInclude Include="..\..\..\src\jrd\BlobUtil.h">
<Filter>Header files</Filter>
</ClInclude>
<ClInclude Include="..\..\..\src\jrd\blp.h">
<Filter>Header files</Filter>
</ClInclude>
Expand Down
203 changes: 203 additions & 0 deletions doc/sql.extensions/README.blob_util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# `RDB$BLOB_UTIL` package (FB 5.0)

This package exists to manipulate BLOBs in a way that standard Firebird functions, like `BLOB_APPEND` and `SUBSTRING` cannot do it or is very slow.

These routines operates on binary data directly, even for text BLOBs.

## Function `NEW_BLOB`

`RDB$BLOB_UTIL.NEW_BLOB` is used to create a new BLOB. It returns a BLOB suitable for data appending, like `BLOB_APPEND` does.

The advantage over `BLOB_APPEND` is that it's possible to set custom `SEGMENTED` and `TEMP_STORAGE` options.

`BLOB_APPEND` always creates BLOB in temporary storage. That may not be the best approach if the created BLOB is going to be stored in a permanent table, as it will require copy.

Returned BLOB from this function, even when `TEMP_STORAGE = FALSE` may be used with `BLOB_APPEND` for appending data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need such an artificial for SQL concept as "handle" here? Every blob is represented with blob ID which actually is a handle. Passing a blob here and there inside PSQL (except assigning to a table field) is just a matter of copying its ID, the contents is not touched. So tra_blob_util_map may just store blob IDs created/opened with RDB$BLOB_UTIL package. And all package functions may declare inputs/outputs as just BLOB instead of INTEGER handle. Do I miss anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blob id is used in the client with a handle. A handle in this context is an id more the blb class inside the engine. A blb has information like current position. RDB$BLOB_UTIL handles model this concept in PSQL.

A blob id for this would be very confusing. Many different variables would have the same id so how one could have multiple parallel seek/read in the same blob id?

Also a blob id is implicitely copied depending on blob charset when they are passed as arguments.

Input parameter:
- `SEGMENTED` type `BOOLEAN NOT NULL`
- `TEMP_STORAGE` type `BOOLEAN NOT NULL`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should be prepared for the tablespaces feature, so that blobs could be created in the explicitly specified (by name) tablespace which can be either "permanent" or "temporary". This requires more thinking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new parameter with default. Also named arguments as Oracle => would be very interesting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that TEMP_STORAGE may conflict with the tablespace, e.g. TEMP_STORAGE = TRUE but TABLESPACE = MY_BLOB_SPACE. This looks error-prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how the same problem is going to be resolved in regard to storage specified in BPB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lower-level options (TEMP_STORAGE parameter or isc_bpb_storage_temp) should override the DDL-level default storage.


Return type: `BLOB NOT NULL`.

## Function `OPEN_BLOB`

`RDB$BLOB_UTIL.OPEN_BLOB` is used to open an existing BLOB for read. It returns a handle (an integer bound to the transaction) suitable for use with others functions of this package, like `SEEK`, `READ_DATA` and `CLOSE_HANDLE`.

Input parameter:
- `BLOB` type `BLOB NOT NULL`

Return type: `INTEGER NOT NULL`.

## Function `IS_WRITABLE`

`RDB$BLOB_UTIL.IS_WRITABLE` returns `TRUE` when BLOB is suitable for data appending without copying using `BLOB_APPEND`.

Input parameter:
- `BLOB` type `BLOB NOT NULL`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need yet another routine -- something like APPEND_BLOB -- to concatenate the whole other blob if it's longer than 32KB. Here "BLOB" in the name again seems redundant ;-) so better naming ideas are welcome. Or we should find a way to make APPEND polymorphic in regard in its input.

Copy link
Member Author

@asfernandes asfernandes May 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could create a VARIANT type which could be used as system routines arguments - and also as general data type.

System functions already can work in this way, but they do not have stored metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the VARIANT type might be an interesting idea per se, it requires some serious thinking and discussions and it could be an overkill for this particular need if we need to release v5 really soon. So I'd be more happy with APPEND_TEXT (or APPEND_STRING if you wish) and separate APPEND_BLOB.

Return type: `BOOLEAN NOT NULL`.

## Function `READ_DATA`

`RDB$BLOB_UTIL.READ_DATA` is used to read chunks of data of a BLOB handle opened with `RDB$BLOB_UTIL.OPEN_BLOB`. When the BLOB is fully read and there is no more data, it returns `NULL`.

If `LENGTH` is passed with a positive number, it returns a VARBINARY with its maximum length.

If `LENGTH` is `NULL` it returns just a segment of the BLOB with a maximum length of 32765.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, blob segments may be up to 64KB in length. Will the longer-than-32KB segment be truncated to 32KB and the next READ call would return the remaining part? Can there be any consequences if the segment is split to multiple parts? For example, one source segment will be written as two segments in the target blob. Blob filters may be not able to decode half-chunks properly (firstly it's about built-in transliteration filters -- think about splitting in the middle of a multi-byte character -- although perhaps they never deal with chunks longer then 32KB).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could increase max VARCHAR to 64KB - 2. But also, is there an impediment to have max dsc_length of dtype_varying to 64KB?

As I understand, there should not be many places just reading and incrementing dsc_length, so dtype_cstring / dtype_varying does not need to have the constant size added to dsc_length.

It would simplify various places that substract and re-add that value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the segment split problem remains anyway, if the LENGTH argument is not NULL (and less than the segment size). So perhaps we don't need to do anything special right now. Those who use filtered blobs should either prefer under-32KB segments or avoid using this package.

Input parameters:
- `HANDLE` type `INTEGER NOT NULL`
- `LENGTH` type `INTEGER`

Return type: `VARBINARY(32767)`.

## Function `SEEK`

`RDB$BLOB_UTIL.SEEK` is used to set the position for the next `READ_DATA`. It returns the new position.

`MODE` may be 0 (from the start), 1 (from current position) or 2 (from end).

When `MODE` is 2, `OFFSET` should be zero or negative.

Input parameter:
- `HANDLE` type `INTEGER NOT NULL`
- `MODE` type `INTEGER NOT NULL`
- `OFFSET` type `INTEGER NOT NULL`

Return type: `INTEGER NOT NULL`.

## Procedure `CANCEL_BLOB`

`RDB$BLOB_UTIL.CANCEL_BLOB` is used to immediately release a temporary BLOB, like one created with `BLOB_APPEND`.

Note that if the same BLOB is used after cancel, using the same variable or another one with the same BLOB id reference, invalid blob id error will be raised.

## Procedure `CLOSE_HANDLE`

`RDB$BLOB_UTIL.CLOSE_HANDLE` is used to close a BLOB handle opened with `RDB$BLOB_UTIL.OPEN_BLOB`.

Not closed handles are closed automatically only in the transaction end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With parameters being BLOB rather than INTEGER handle, I'd just call it "CLOSE". And maybe think about "auto-close" scenarios in some cases.

Input parameter:
- `HANDLE` type `INTEGER NOT NULL`

# Examples

- Example 1: Create a BLOB in temporary space and return it in `EXECUTE BLOCK`:

```
execute block returns (b blob)
as
begin
-- Create a BLOB handle in the temporary space.
b = rdb$blob_util.new_blob(false, true);

-- Add chunks of data.
b = blob_append(b, '12345');
b = blob_append(b, '67');

suspend;
end
```

- Example 2: Open a BLOB and return chunks of it with `EXECUTE BLOCK`:

```
execute block returns (s varchar(10))
as
declare b blob = '1234567';
declare bhandle integer;
begin
-- Open the BLOB and get a BLOB handle.
bhandle = rdb$blob_util.open_blob(b);

-- Get chunks of data as string and return.

s = rdb$blob_util.read_data(bhandle, 3);
suspend;

s = rdb$blob_util.read_data(bhandle, 3);
suspend;

s = rdb$blob_util.read_data(bhandle, 3);
suspend;

-- Here EOF is found, so it returns NULL.
s = rdb$blob_util.read_data(bhandle, 3);
suspend;

-- Close the BLOB handle.
execute procedure rdb$blob_util.close_handle(bhandle);
end
```

- Example 3: Seek in a blob.

```
create table t(b blob);

set term !;

execute block returns (s varchar(10))
as
declare b blob;
begin
-- Create a stream BLOB handle.
b = rdb$blob_util.new_blob(false, true);

-- Add data.
b = blob_append(b, '0123456789');

-- Materialize the BLOB.
insert into t (b) values (:b);

-- Open the BLOB.
b = rdb$blob_util.open_blob(b);

-- Seek to 5 since the start.
rdb$blob_util.seek(b, 0, 5);
s = rdb$blob_util.read_data(b, 3);
suspend;

-- Seek to 2 since the start.
rdb$blob_util.seek(b, 0, 2);
s = rdb$blob_util.read_data(b, 3);
suspend;

-- Advance 2.
rdb$blob_util.seek(b, 1, 2);
s = rdb$blob_util.read_data(b, 3);
suspend;

-- Seek to -1 since the end.
rdb$blob_util.seek(b, 2, -1);
s = rdb$blob_util.read_data(b, 3);
suspend;
end!

set term ;!
```

- Example 4: Check if blobs are writable:

```
create table t(b blob);

set term !;

execute block returns (bool boolean)
as
declare b blob;
begin
b = blob_append(null, 'writable');
bool = rdb$blob_util.is_writable(b);
suspend;

insert into t (b) values ('not writable') returning b into b;
bool = rdb$blob_util.is_writable(b);
suspend;
end!

set term ;!
```
2 changes: 2 additions & 0 deletions src/include/firebird/impl/msg/jrd.h
Original file line number Diff line number Diff line change
Expand Up @@ -961,3 +961,5 @@ FB_IMPL_MSG(JRD, 959, quoted_str_miss, -901, "22", "024", "Missing terminating q
FB_IMPL_MSG(JRD, 960, wrong_shmem_ver, -902, "08", "006", "@1: inconsistent shared memory type/version; found @2, expected @3")
FB_IMPL_MSG(JRD, 961, wrong_shmem_bitness, -902, "08", "006", "@1-bit engine can't open database already opened by @2-bit engine")
FB_IMPL_MSG(JRD, 962, wrong_proc_plan, -281, "HY", "000", "Procedures cannot specify access type other than NATURAL in the plan")
FB_IMPL_MSG(JRD, 963, invalid_blob_util_handle, -402, "42", "000", "Invalid RDB$BLOB_UTIL handle")
FB_IMPL_MSG(JRD, 964, bad_temp_blob_id, -402, "42", "000", "Invalid temporary BLOB ID")
2 changes: 2 additions & 0 deletions src/include/gen/Firebird.pas
Original file line number Diff line number Diff line change
Expand Up @@ -5302,6 +5302,8 @@ IProfilerStatsImpl = class(IProfilerStats)
isc_wrong_shmem_ver = 335545280;
isc_wrong_shmem_bitness = 335545281;
isc_wrong_proc_plan = 335545282;
isc_invalid_blob_util_handle = 335545283;
isc_bad_temp_blob_id = 335545284;
isc_gfix_db_name = 335740929;
isc_gfix_invalid_sw = 335740930;
isc_gfix_incmp_sw = 335740932;
Expand Down
Loading