Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: TypeError: Object of type DictValue is not JSON serializable #491

Closed
Datseris opened this issue Apr 18, 2024 · 9 comments
Closed
Labels
bug Something isn't working

Comments

@Datseris
Copy link

Affects: PythonCall

Describe the bug

I am converting some old code I had with PyCall.jl to PythonCall.jl. Unfortunately, the PythonCall version does not work and gives error:

Python: TypeError: Object of type DictValue is not JSON serializable  

I am not sure how easy it is to make a MWE, as it requires downloading some climate data from copernicus that requires an account. Here is the setup I have:

installation:

using CondaPkg
CondaPkg.add("cdsapi"; channel = "conda-forge")
using PythonCall
cdsapi = pyimport("cdsapi")

then I must make a dictionary that stores the configuration options for what data to download. I do this using something like:

config =
Dict(
    "product_type" => "monthly_averaged_reanalysis",
    "variable" => [
        "2m_temperature", "surface_pressure", "sea_surface_temperature",
    ],
    # more options...
)

then, I download the data by calling the command:

c = cdsapi.Client()
c.retrieve("reanalysis-era5-single-levels-monthly-means", Py(config), "filename.nc")

I also tried without wrapping the config to Py, but I got the same error.

Exactly the same piece of code worked with PyCall (with the same pyimport statement.

I am not sure what the problem is or what the bug is. I am happy to help to solve this if you need more information

Here I am attaching the Python code version that would work, and comes from this webpage.

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-single-levels-monthly-means',
    {
        'format': 'netcdf',
        'product_type': 'monthly_averaged_reanalysis',
        'variable': [
            '10m_wind_speed', '2m_temperature', 'cloud_base_height',
        ],
        'year': [
            '2018',
        ],
        'month': [
            '01', 
        ],
        'time': '00:00',
        'area': [
            90, -180, -90,
            180,
        ],
    },
    'download.nc')

Your system
Please provide detailed information about your system:

  • Windows 10
  • Julia = v1.10, PythonCall = v0.9.19,
    Conda status:
CondaPkg Status ...\CondaPkg.toml
Not Resolved (resolve first for more information)
Packages
  cdsapi (channel=conda-forge)
@Datseris Datseris added the bug Something isn't working label Apr 18, 2024
@Datseris
Copy link
Author

Datseris commented Apr 18, 2024

Furthermore, using pybuiltins.dict(config) yields the same error now saying "VectorValue" instead of "DictValue". I am not sure what the problem is but it is likely to be a bug with PythonCall.jl (given that same code worked with PyCall.jl).

@Datseris
Copy link
Author

Datseris commented Apr 18, 2024

Hi there, I am back here. I think in the end this is NOT an issue with PythonCall.jl. I restored the old PyCall.jl code. It also doesn't worrk and errors with a similar error:

TypeError('Object of type ndarray is not JSON serializable')

notice the different type ndarray here. So in the end I think this is a bug in newer versions of cdsapi package.

@Datseris
Copy link
Author

Hm, but in the cdsapi there is no mentioning on such a bug, and given the popularity of the service if this was truly an issue with that package there would be some bug report.

Is there a way to truly pass in an actualy Python dictionry to the call c.retrieve(..., config, ...)? Even when I pass the pybuiltins.dict(config) I am still not passing an actual python dictionary type.

@cjdoris
Copy link
Collaborator

cjdoris commented Apr 18, 2024

OK so for clarity the issue you're seeing is that Julia Dicts are by default passed to Python as a juliacall.DictValue wrapper type, which behaves like a dict in most respects but actually just wraps the underlying Julia object. And json.dump doesn't have a rule for how to serialise a juliacall.DictValue:

julia> using PythonCall

julia> x = Dict("foo" => "bar")
Dict{String, String} with 1 entry:
  "foo" => "bar"

julia> pyimport("json").dumps(x)
ERROR: Python: TypeError: Object of type DictValue is not JSON serializable

So you'll need to ensure that the object you're serialising is only composed of Python native types all the way down. pybuiltins.dict(config) certainly does return an actual Python dict but maybe somewhere deeper down it contains a Julia Dict which is being wrapped as a juliacall.DictValue when accessed. So you may need to write a little function to go through and recursively convert all juliacall.DictValues to dict (and maybe similar for juliacall.VectorValues to lists if that makes sense).

@cjdoris
Copy link
Collaborator

cjdoris commented Apr 18, 2024

Another option is that if you can control the JSON serialiser class being used then you could write your own which handles DictValues by converting them to dicts. This involves subclassing json.JSONEncoder and adding a method for default: https://docs.python.org/3/library/json.html#json.JSONEncoder.default

@Datseris
Copy link
Author

Hi,

thanks for the reply, but I admit I am rather confused now... Isn't one of the things PythonCall.jl does to convert Julia data into Python data when I am calling the function? Here this isn't really happening, because If I was passing in Python data they would be serialized correctly, as the python code actually works.

What am I missing here? Why would calling a numpy python function work but not this one? The serialization happens on the Python side, not the Julia side.

@Datseris
Copy link
Author

For what its worth, what you said of explicitly converting everything, did work:

function to_python(x)
    if x isa AbstractArray
        return PythonCall.pybuiltins.list(x)
    elseif x isa AbstractDict
        return PythonCall.pybuiltins.dict(x)
    else
        return PythonCall.Py(x)
    end
end

config = Dict(k => to_python(x) for (k, x) in config)

config = PythonCall.pybuiltins.dict(config)

and I don't get any errors.

@cjdoris
Copy link
Collaborator

cjdoris commented Apr 25, 2024

It converts Julia objects to Python objects yes, but not necessarily to the types you might expect - in particular a Julia Dict is converted to a Python juliacall.DictValue not a dict. These two types behave very similarly, but the former is a simple wrapper around the Julia object, and only the latter is natively serialisable to JSON without writing some extra rules for the JSON serialiser.

@cjdoris
Copy link
Collaborator

cjdoris commented Apr 25, 2024

I'm glad you got something working. I'll point out you can also use pydict instead of pybuiltins.dict, which is generally preferable from the Julia side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants