Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Reflection #195

Closed
bshinnebarger opened this issue Apr 9, 2020 · 17 comments
Closed

Redshift Reflection #195

bshinnebarger opened this issue Apr 9, 2020 · 17 comments

Comments

@bshinnebarger
Copy link

Noticed what I think is an issue similar to what was resolved in 0.7.1
https://sqlalchemy-redshift.readthedocs.io/en/latest/#id1

#138

Where reflection is failing in the same area due to support for generated columns being added in this change

https://docs.sqlalchemy.org/en/13/changelog/changelog_13.html#change-1.3.11

When I changed my environment to sqlalchemy==1.3.10 the issue went away, so I think I'm on the right track

@vinceatbluelabs
Copy link

I'll note that sqlalchemy==1.3.15 seems to work fine, so not sure about 1.3.11 causing this.

@bshinnebarger
Copy link
Author

Hmm, yes it does seem to work with 1.3.15

All I did in 1.3.16 to get the error was this

engine = create_engine(f'redshift+psycopg2://{DB_USR}:{DB_PWD}@{DB_HOST}:5439/{db_schema}')
    db_meta = MetaData(bind=engine, schema='public')
mytable = Table('my_table', db_meta, autoload=True)
TypeError: get_column_info() missing 1 required positional argument: 'generated'

I believe it was this commit they added on Mar. 15th
sqlalchemy/sqlalchemy@62b7dac#diff-d33159d80d3deef1d5bdcd057dcc3d6b

@jklukas
Copy link
Member

jklukas commented Apr 10, 2020

I'd be happy to review and integrate a PR to handle the behavior you're seeing in recent sqlalchemy versions.

@bshinnebarger
Copy link
Author

I'm fairly new to SQLAlchemy for Redshift, so still trying to wrap my head around how it works under the hood, but I can certainly TRY to figure it out and submit a PR

@jklukas
Copy link
Member

jklukas commented Apr 10, 2020

For context, the maintainers here aren't actively doing new work on the project. We don't have bandwidth to do active maintenance or feature development, so we primarily are involved to vet contributions from the community.

I appreciate you writing up the problem. Even if you're not able to get to a solution, it provides a good starting point if someone in the community is able to pick it up.

@bshinnebarger
Copy link
Author

Gotcha, that makes sense. I looked into what exactly generated columns are in Postgres and it seems like it's a new feature in Postgres 12 as of late 2019. I think it's safe to assume it isn't supported by Redshift and won't be for a long time, if ever, so it doesn't need to be implemented, but somehow ignored

https://www.codeproject.com/Articles/5246207/Generated-Columns-in-PostgreSQL

I'll take a crack at it this weekend

@adblair
Copy link

adblair commented Apr 23, 2020

The problem here is that sqlalchemy-redshift is overriding the private method _get_column_info(), so it will always be vulnerable to breaking changes between releases of SQLAlchemy. I came across a similar problem in Flask-Admin last week, where the lead author of SQLAlchemy advised that library authors should:

... seek to get our help with that rather than using private API.

So @bshinnebarger, if you can't find a good way to resolve this without removing the dependence on a private method, it would be worth seeking support from SQLAlchemy directly in order to get to a safe implementation.

@bshinnebarger
Copy link
Author

bshinnebarger commented Apr 23, 2020

@adblair Yes I did see we were doing that, but I did not try to fix that. I submitted a PR a couple weeks ago #197 where I updated the logic to mimic how SQLAlchemy was handling generated columns in a recent commit of theirs. Effectively it will gracefully ignore it, as this is a feature in PostgreSQL > 12 and Redshift is still around 8 and I don't think would support this feature anytime soon

@alxfed
Copy link

alxfed commented Apr 28, 2020

Dear friends,
I'm getting the same type of error as all the people above, namely:

    File: .../sqlalchemy_redshift/dialect.py", line 685, in _get_column_info
    **kw
TypeError: _get_column_info() missing 1 required positional argument: 'generated'

What should I do to put your software to work with 1.3.16?

P.S. and these cyclical links to this topic on all the pages, promising a 'workaround' are particularly annoying if not to say very offensive.

@jklukas
Copy link
Member

jklukas commented Apr 29, 2020

It looks like a change needs to happen in the following section of code:

    def _get_column_info(self, *args, **kwargs):

        kw = kwargs.copy()

        encode = kw.pop('encode', None)

        if sa.__version__ < '1.2.0':

            # SQLAlchemy 1.2.0 introduced the 'comment' param

            del kw['comment']

        column_info = super(RedshiftDialect, self)._get_column_info(

            *args,

            **kw

        )

In particular, the call to super(RedshiftDialect, self)._get_column_info must pass a generated key in the kw dict for newer version of SQLAlchemy and must not pass that key for older versions of SQLAlchemy.

@bshinnebarger
Copy link
Author

@aixfed I submitted a PR to update this similar to how the SQL alchemy guys did it. The auto build is failing on some linting I think though, but it's a fairly small change in the section @jklukas mentions above. A workaround is to explicitly force 1.3.15 on your sqlalchemy version.

#197

@bshinnebarger
Copy link
Author

@alxfed I'm not a maintainer, just sharing what worked for me. At this very moment I'm using:

SQLAlchemy 1.3.15
sqlalchemy-redshift 0.7.7
psycopg2-binary 2.8.5

And everything works just fine when I do something like:
tbl = Table('my_table', meta_data, autoload=True)

The only thing I had to change was forcing SQLAlchemy to 1.3.15

Hope that helps until an official fix is in

@jklukas
Copy link
Member

jklukas commented May 29, 2020

I just released 0.7.9 which includes #200 and should allow use of SQLAlchemy 1.3.16+.

@GrayAn
Copy link
Contributor

GrayAn commented Jun 26, 2020

Greetings!

It looks like this bug is not gone yet. The following condition is truthy for the SQLAlchemy versions 1.3.2-1.3.9:

if sa.__version__ >= '1.3.16':
    # SQLAlchemy 1.3.16 introduced generated columns,
    # not supported in redshift
    kw['generated'] = ''

So it causes the following exception for these SQLAlchemy versions:

TypeError: _get_column_info() got an unexpected keyword argument 'generated'

The most simple solution is to convert version string representation to tuple of integers like this:

version = tuple(int(x) for x in sa.__version__.split('.'))
if version >= (1, 2, 16):
    kw['generated'] = ''

But it may crash if version number is "1.4.0b1" for example (it is the current version of SQLAlchemy package on Github by the way).

Maybe the packaging library can be used to compare versions but it means an extra dependency.

@jklukas
Copy link
Member

jklukas commented Jun 26, 2020

Thanks for reporting, @GrayAn. Your explanation makes perfect sense. I agree that it would be desirable to do some "real" parsing of the version number here.

@GrayAn
Copy link
Contributor

GrayAn commented Jun 26, 2020

I can make a pull request just don't know what is your current contribution policy. Is it OK to include packaging dependency just for a single or couple of version checks?

@jklukas
Copy link
Member

jklukas commented Jun 26, 2020

Is it OK to include packaging dependency just for a single or couple of version checks?

I believe modern sqlalchemy versions require setuptools, which requires packaging anyway. So in practice, packaging is already required and I'm fine with listing it as an explicit dependency as long as we're not strict about version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants