Skip to content

Commit 7cc8d52

Browse files
authored
Updated depsusage evidence API and ResolveImports for python language with tests (#15)
* Updated depsusage evidence API and updated python import resolution logic Signed-off-by: Omkar Phansopkar <[email protected]> * Implemented packagehint resolution and docs for import and usage evidence Signed-off-by: Omkar Phansopkar <[email protected]> --------- Signed-off-by: Omkar Phansopkar <[email protected]>
1 parent b8e8fdc commit 7cc8d52

16 files changed

+394
-135
lines changed

core/ast/import.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import (
88

99
// ImportNode represents an import statement in a source file
1010
// This is a language agnostic representation of an import statement.
11-
// Not all attributes may be present in all languages.
11+
// Not all attributes may be present in all languages
1212
type ImportNode struct {
1313
Node
1414

docs/ADR.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ various use-cases faster and more efficiently. The key building blocks are:
2525

2626
Common plugins include
2727

28-
- Import Resolver: Ability to resolve imports and load the file for analysis
28+
- Import Resolver: Ability to resolve imports and load the file for analysis. Read more - [Imports](imports.md)
2929

3030
Like most analysis systems, higher level of abstractions are built on top
3131
of lower level abstractions. This means, the analysis plugins that produce

docs/imports.md

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Imports
2+
3+
Imports returned by `language.Resolvers().ResolveImports()` are represented as [ImportNode](/core/ast/import.go) in a language agnostic manner. It has following fields -
4+
5+
- `moduleNameNode` exposed by `ModuleName`
6+
7+
The sitter node referring to the imported package or module. It contains entire module name, a non-empty string which can be resolved to the target package or source file on the disk.
8+
9+
eg. In python, ModuleName `x.y.z` is resolved for imports - `from x.y.z import p` is `x.y.z`, `import x.y.z` or `import x.y.z as xz`
10+
11+
eg. In javascript, ModuleName can be `../relative/import`, `@gilbarbara/eslint-config`, `express`
12+
13+
- `moduleItemNode` exposed by `ModuleItem`
14+
15+
The sitter node referring to the specific item (function, class, variable, etc) imported from the `ModuleName` mentioned above. It is an empty string if the entire module is imported.
16+
17+
eg. For python import `from sklearn import dastasets as ds` is resolved to ModuleItem - `datasets`
18+
19+
eg. For javascript import `import { hex } from 'chalk/ansi-styles'`, ModuleItem is `hex`
20+
21+
- `moduleAliasNode` exposed by `ModuleAlias`
22+
23+
The sitter node referring to alias of the import in the current scope. It is mapped as equivalent to the `ModuleItem` (if it is empty, then `ModuleName`). If no alias is specified in code, then it contains the node for actual Moduleltem or ModuleName.
24+
25+
eg. For python import `from sklearn import datasets as ds`, alias is `ds` referring to ModuleItem - `datasets`
26+
However, for `import pandas as pd`, alias `pd` refers to ModuleName - `pandas` since ModuleItem is empty.
27+
28+
- `isWildcardImport` exposed by `IsWildcardImport`
29+
30+
Boolean flag Indicating whether the import is a wildcard import
31+
32+
eg. In python - `from seaborn import *`
33+
34+
eg. In java - `import java.util.*`
35+
36+
37+
## Note
38+
For composite imports, multiple `ImportNode`s are generated.
39+
For example, `import ReactDOM, { render, flushSync as flushIt } from 'react-dom'` is resolved to three import nodes -
40+
```
41+
ImportNode{ModuleName: react-dom, ModuleItem: , ModuleAlias: ReactDOM, WildcardImport: false}
42+
ImportNode{ModuleName: react-dom, ModuleItem: render, ModuleAlias: render, WildcardImport: false}
43+
ImportNode{ModuleName: react-dom, ModuleItem: flushSync, ModuleAlias: flushIt, WildcardImport: false}
44+
```
45+
46+
For different edge cases refer to `ImportExpectations` testcases in `_test` files in [lang/](/lang) directory

docs/usageevidence.md

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Usag evidence
2+
3+
`UsageEvidence` represents the evidence of usage of a module item in a file. Fields -
4+
5+
- PackageHint - string
6+
7+
Imported modules aren't exactly same as packages they refer to. It can be a submodule with separators or the top-level module may not match exact package name. As a usage evidence, PackageHint reports the hint of actual dependency being used.
8+
9+
For example, the `PyYAML` package is imported as `yaml` or `yaml.composer` where the imported top level module `yaml` isn't equal to the package name.
10+
11+
PackageHint is resolved from the `ModuleName` provided by [ImportNode](/core/ast/import.go) by resolving the base module and searching it in the top-level module to dependency mapping [Read more](https://github.com/safedep/code/issues/6).
12+
13+
14+
Moreover, this may not be the final truth, since different languages & package managers may have some package aliasing eg. Shadow JAR in java. Hence, it is just a "hint".
15+
This can be verified or enriched accurately by the consumer of this API using the required package manifest information which isn't in scope of code analysis framework.
16+
17+
- Identifier - string
18+
19+
The identifier that was mentioned in the code leading to this Usage evidence. It can be an imported function, class or variable.
20+
21+
eg.
22+
```python
23+
import ujson
24+
ujson.decode('{"a": 1, "b": 2}')
25+
```
26+
27+
Here, the identifier `ujson` was used, leading to this UsageEvidence
28+
29+
- FilePath - string
30+
31+
File path where the dependency was used
32+
33+
- Line - uint
34+
35+
Line number where the usage was found
36+
37+
Note - Line number of usage is reported, not the import
38+
39+
Fields taken directly from ImportNode. [Read more](imports.md)
40+
- ModuleName - string
41+
- ModuleItem - string
42+
- ModuleAlias - string
43+
- IsWildCardUsage - bool
44+

lang/factory_test.go

+5
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ import (
77
"github.com/stretchr/testify/assert"
88
)
99

10+
type ImportExpectations struct {
11+
filePath string
12+
imports []string
13+
}
14+
1015
var resolveLanguageTestcases = []struct {
1116
filePath string
1217
exists bool

lang/fixtures/imports.py

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# 1. Importing an entire module
2+
import prototurk
3+
import sys
4+
5+
# 2. Importing a module/submodule/item with an alias
6+
import pandas as pd
7+
import langchain.chat_models as customchat
8+
import matplotlib.pyplot as plt
9+
10+
# 3. Importing a module conditionally
11+
try:
12+
import ujson
13+
import plistlib as plb
14+
except ImportError:
15+
import simplejson as smpjson
16+
17+
# 4. Importing all functions from a module / submodule
18+
from seaborn import *
19+
from flask.helpers import *
20+
from xyz.pqr.mno import *
21+
22+
# 5. Importing a specific item from a module
23+
from math import sqrt
24+
from langchain_community import llms
25+
26+
# 6. Importing with/without an alias for a specific function
27+
from odbc import connect, fetch
28+
from sklearn import datasets as ds, metric, preprocessing as pre
29+
from oauthlib.oauth2 import WebApplicationClient as WAC, WebApplicationServer

lang/javascript_resolvers.go

+6-7
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ type javascriptResolvers struct {
1616

1717
var _ core.LanguageResolvers = (*javascriptResolvers)(nil)
1818

19-
const wholeModuleImportQuery = `
19+
const jsWholeModuleImportQuery = `
2020
(import_statement
2121
(import_clause
2222
(identifier) @module_alias)
@@ -37,7 +37,7 @@ const wholeModuleImportQuery = `
3737
arguments: (arguments (string (string_fragment) @module_name))))))
3838
`
3939

40-
const requireModuleQuery = `
40+
const jsRequireModuleQuery = `
4141
(lexical_declaration
4242
(variable_declarator
4343
name: (identifier) @module_alias
@@ -64,7 +64,7 @@ const requireModuleQuery = `
6464
arguments: (arguments (string (string_fragment) @module_name)))))
6565
`
6666

67-
const specifiedItemImportQuery = `
67+
const jsSpecifiedItemImportQuery = `
6868
(import_statement
6969
(import_clause
7070
(named_imports
@@ -83,14 +83,14 @@ func (r *javascriptResolvers) ResolveImports(tree core.ParseTree) ([]*ast.Import
8383
var imports []*ast.ImportNode
8484

8585
queryRequestItems := []ts.QueryItem{
86-
ts.NewQueryItem(wholeModuleImportQuery, func(m *sitter.QueryMatch) error {
86+
ts.NewQueryItem(jsWholeModuleImportQuery, func(m *sitter.QueryMatch) error {
8787
node := ast.NewImportNode(data)
8888
node.SetModuleAliasNode(m.Captures[0].Node)
8989
node.SetModuleNameNode(m.Captures[1].Node)
9090
imports = append(imports, node)
9191
return nil
9292
}),
93-
ts.NewQueryItem(specifiedItemImportQuery, func(m *sitter.QueryMatch) error {
93+
ts.NewQueryItem(jsSpecifiedItemImportQuery, func(m *sitter.QueryMatch) error {
9494
node := ast.NewImportNode(data)
9595
alreadyEncounteredIdentifier := false
9696
for _, capture := range m.Captures {
@@ -109,7 +109,7 @@ func (r *javascriptResolvers) ResolveImports(tree core.ParseTree) ([]*ast.Import
109109
imports = append(imports, node)
110110
return nil
111111
}),
112-
ts.NewQueryItem(requireModuleQuery, func(m *sitter.QueryMatch) error {
112+
ts.NewQueryItem(jsRequireModuleQuery, func(m *sitter.QueryMatch) error {
113113
if len(m.Captures) < 3 {
114114
return nil
115115
}
@@ -147,7 +147,6 @@ func (r *javascriptResolvers) ResolveImports(tree core.ParseTree) ([]*ast.Import
147147
}
148148

149149
err = ts.ExecuteQueries(ts.NewQueriesRequest(r.language, queryRequestItems), data, tree)
150-
151150
if err != nil {
152151
return nil, err
153152
}

lang/javascript_test.go

+2-7
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,7 @@ func TestJavascriptLanguageMeta(t *testing.T) {
2727
})
2828
}
2929

30-
type ImportExpectations struct {
31-
filePath string
32-
imports []string
33-
}
34-
35-
var importExpectations = []ImportExpectations{
30+
var javascriptImportExpectations = []ImportExpectations{
3631
{
3732
filePath: "fixtures/imports.js",
3833
imports: []string{
@@ -78,7 +73,7 @@ func TestJavascriptLanguageResolvers(t *testing.T) {
7873

7974
importExpectationsMapper := make(map[string][]string)
8075
importFilePaths := []string{}
81-
for _, ie := range importExpectations {
76+
for _, ie := range javascriptImportExpectations {
8277
importFilePaths = append(importFilePaths, ie.filePath)
8378
importExpectationsMapper[ie.filePath] = ie.imports
8479
}

lang/python_resolvers.go

+53-42
Original file line numberDiff line numberDiff line change
@@ -9,45 +9,46 @@ import (
99
sitter "github.com/smacker/go-tree-sitter"
1010
)
1111

12-
const pythonImportQuery = `
12+
const pyWholeModuleImportQuery = `
1313
(import_statement
1414
name: ((dotted_name) @module_name))
1515
16+
(import_statement
17+
name: (aliased_import
18+
name: ((dotted_name) @module_name)
19+
alias: (identifier) @module_alias))
20+
21+
(import_from_statement
22+
module_name: (dotted_name) @module_name
23+
(wildcard_import) @wildcard_import)
24+
25+
(import_from_statement
26+
module_name: (relative_import) @module_name
27+
(wildcard_import) @wildcard_import)
28+
`
29+
const pyItemImportQuery = `
1630
(import_from_statement
1731
module_name: (dotted_name) @module_name
1832
name: (dotted_name
19-
(identifier) @submodule_name @submodule_alias))
33+
(identifier) @module_item @module_item_alias))
2034
2135
(import_from_statement
2236
module_name: (relative_import) @module_name
2337
name: (dotted_name
24-
(identifier) @submodule_name @submodule_alias))
25-
26-
(import_statement
27-
name: (aliased_import
28-
name: ((dotted_name) @module_name @submodule_name)
29-
alias: (identifier) @submodule_alias))
38+
(identifier) @module_item @module_item_alias))
3039
3140
(import_from_statement
3241
module_name: (dotted_name) @module_name
3342
name: (aliased_import
3443
name: (dotted_name
35-
(identifier) @submodule_name)
36-
alias: (identifier) @submodule_alias))
44+
(identifier) @module_item)
45+
alias: (identifier) @module_item_alias))
3746
3847
(import_from_statement
3948
module_name: (relative_import) @module_name
4049
name: (aliased_import
41-
name: ((dotted_name) @submodule_name)
42-
alias: (identifier) @submodule_alias))
43-
44-
(import_from_statement
45-
module_name: (dotted_name) @module_name
46-
(wildcard_import) @wildcard_import)
47-
48-
(import_from_statement
49-
module_name: (relative_import) @module_name
50-
(wildcard_import) @wildcard_import)
50+
name: ((dotted_name) @module_item)
51+
alias: (identifier) @module_item_alias))
5152
`
5253

5354
type pythonResolvers struct {
@@ -62,34 +63,44 @@ func (r *pythonResolvers) ResolveImports(tree core.ParseTree) ([]*ast.ImportNode
6263
return nil, fmt.Errorf("failed to get data from parse tree: %w", err)
6364
}
6465

65-
qx := ts.NewQueryExecutor(r.language.Language(), *data)
66-
matches, err := qx.Execute(tree.Tree().RootNode(), pythonImportQuery)
67-
if err != nil {
68-
return nil, fmt.Errorf("failed to execute query: %w", err)
69-
}
70-
71-
defer matches.Close()
72-
7366
var imports []*ast.ImportNode
74-
err = matches.ForEach(func(m *sitter.QueryMatch) error {
75-
node := ast.NewImportNode(data)
76-
node.SetModuleNameNode(m.Captures[0].Node)
7767

78-
if len(m.Captures) > 1 {
79-
if m.Captures[1].Node.Type() == "wildcard_import" {
68+
queryRequestItems := []ts.QueryItem{
69+
ts.NewQueryItem(pyWholeModuleImportQuery, func(m *sitter.QueryMatch) error {
70+
node := ast.NewImportNode(data)
71+
node.SetModuleNameNode(m.Captures[0].Node)
72+
73+
if len(m.Captures) > 1 && m.Captures[1].Node.Type() == "wildcard_import" {
8074
node.SetIsWildcardImport(true)
8175
} else {
82-
node.SetIsWildcardImport(false)
83-
node.SetModuleItemNode(m.Captures[1].Node)
76+
node.SetModuleAliasNode(m.Captures[0].Node)
77+
if len(m.Captures) > 1 {
78+
node.SetModuleAliasNode(m.Captures[1].Node)
79+
}
8480
}
85-
}
86-
87-
if len(m.Captures) > 2 {
81+
imports = append(imports, node)
82+
return nil
83+
}),
84+
ts.NewQueryItem(pyItemImportQuery, func(m *sitter.QueryMatch) error {
85+
node := ast.NewImportNode(data)
86+
node.SetModuleNameNode(m.Captures[0].Node)
87+
node.SetModuleItemNode(m.Captures[1].Node)
8888
node.SetModuleAliasNode(m.Captures[2].Node)
89-
}
90-
imports = append(imports, node)
91-
return nil
92-
})
89+
// print node type and contents of all captures
90+
// fmt.Println("Node", m.Captures[0].Node.Content(*data))
91+
// for _, capture := range m.Captures {
92+
// fmt.Printf("Capture: %s, %s\n", capture.Node.Type(), capture.Node.Content(*data))
93+
// }
94+
95+
imports = append(imports, node)
96+
return nil
97+
}),
98+
}
99+
100+
err = ts.ExecuteQueries(ts.NewQueriesRequest(r.language, queryRequestItems), data, tree)
101+
if err != nil {
102+
return nil, err
103+
}
93104

94105
return imports, err
95106
}

0 commit comments

Comments
 (0)