-
Notifications
You must be signed in to change notification settings - Fork 42
feat: add support for python udf #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
e633052
feat: add support for using python-udf
jesrypandawa aacfd47
fix: fix sample python config properties
jesrypandawa 165c283
feat: add config mapper
jesrypandawa e9c66a3
fix: register the python config one by one without map
jesrypandawa d6fed9d
fix: fix sample config
jesrypandawa 339f435
chore: bump up version to 0.2.7
jesrypandawa c4477fe
fix: adding exception class and fix test
jesrypandawa 37c9f46
fix: change primitive type for python udf config
jesrypandawa 85e978f
fix: fix python file test
jesrypandawa 6661d74
fix: fix python exception
jesrypandawa fff30de
fix: fix python test read zip exclude text file
jesrypandawa 0a168ca
fix: move register python config inside if statement
jesrypandawa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 17 additions & 0 deletions
17
...nctions/src/main/java/io/odpf/dagger/functions/exceptions/PythonFilesFormatException.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| package io.odpf.dagger.functions.exceptions; | ||
|
|
||
| /** | ||
| * The type Python files format exception. | ||
| */ | ||
| public class PythonFilesFormatException extends RuntimeException { | ||
|
|
||
| /** | ||
| * Instantiates a new Python files format exception. | ||
| * | ||
| * @param message the message | ||
| */ | ||
| public PythonFilesFormatException(String message) { | ||
| super(message); | ||
| } | ||
|
|
||
| } |
17 changes: 17 additions & 0 deletions
17
...tions/src/main/java/io/odpf/dagger/functions/exceptions/PythonFilesNotFoundException.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| package io.odpf.dagger.functions.exceptions; | ||
|
|
||
| /** | ||
| * The type Python files not found exception. | ||
| */ | ||
| public class PythonFilesNotFoundException extends RuntimeException { | ||
|
|
||
| /** | ||
| * Instantiates a new Python files not found exception. | ||
| * | ||
| * @param message the message | ||
| */ | ||
| public PythonFilesNotFoundException(String message) { | ||
| super(message); | ||
| } | ||
|
|
||
| } |
64 changes: 64 additions & 0 deletions
64
dagger-functions/src/main/java/io/odpf/dagger/functions/udfs/python/PythonUdfConfig.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| package io.odpf.dagger.functions.udfs.python; | ||
|
|
||
| import com.google.gson.Gson; | ||
| import com.google.gson.GsonBuilder; | ||
| import com.google.gson.annotations.SerializedName; | ||
| import io.odpf.dagger.common.configuration.Configuration; | ||
| import lombok.Getter; | ||
|
|
||
| import static io.odpf.dagger.functions.common.Constants.*; | ||
|
|
||
| public class PythonUdfConfig { | ||
| private static final Gson GSON = new GsonBuilder() | ||
| .enableComplexMapKeySerialization() | ||
| .setPrettyPrinting() | ||
| .create(); | ||
|
|
||
| @SerializedName(PYTHON_FILES_KEY) | ||
| @Getter | ||
| private String pythonFiles; | ||
|
|
||
| @SerializedName(PYTHON_REQUIREMENTS_KEY) | ||
| @Getter | ||
| private String pythonRequirements; | ||
|
|
||
| @SerializedName(PYTHON_ARCHIVES_KEY) | ||
| @Getter | ||
| private String pythonArchives; | ||
|
|
||
| @SerializedName(PYTHON_FN_EXECUTION_ARROW_BATCH_SIZE_KEY) | ||
| private Integer pythonArrowBatchSize; | ||
|
|
||
| @SerializedName(PYTHON_FN_EXECUTION_BUNDLE_SIZE_KEY) | ||
| private Integer pythonBundleSize; | ||
|
|
||
| @SerializedName(PYTHON_FN_EXECUTION_BUNDLE_TIME_KEY) | ||
| private Long pythonBundleTime; | ||
|
|
||
| public int getPythonArrowBatchSize() { | ||
| if (pythonArrowBatchSize == null) { | ||
| return PYTHON_FN_EXECUTION_ARROW_BATCH_SIZE_DEFAULT; | ||
| } | ||
| return pythonArrowBatchSize; | ||
| } | ||
|
|
||
| public int getPythonBundleSize() { | ||
| if (pythonBundleSize == null) { | ||
| return PYTHON_FN_EXECUTION_BUNDLE_SIZE_DEFAULT; | ||
| } | ||
| return pythonBundleSize; | ||
| } | ||
|
|
||
| public long getPythonBundleTime() { | ||
| if (pythonBundleTime == null) { | ||
| return PYTHON_FN_EXECUTION_BUNDLE_TIME_DEFAULT; | ||
| } | ||
| return pythonBundleTime; | ||
| } | ||
|
|
||
| public static PythonUdfConfig parse(Configuration configuration) { | ||
| String jsonString = configuration.getString(PYTHON_UDF_CONFIG, ""); | ||
|
|
||
| return GSON.fromJson(jsonString, PythonUdfConfig.class); | ||
| } | ||
| } |
68 changes: 68 additions & 0 deletions
68
dagger-functions/src/main/java/io/odpf/dagger/functions/udfs/python/PythonUdfManager.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| package io.odpf.dagger.functions.udfs.python; | ||
|
|
||
| import io.odpf.dagger.functions.exceptions.PythonFilesFormatException; | ||
| import io.odpf.dagger.functions.exceptions.PythonFilesNotFoundException; | ||
| import org.apache.flink.table.api.bridge.java.StreamTableEnvironment; | ||
|
|
||
| import java.io.IOException; | ||
| import java.util.Enumeration; | ||
| import java.util.zip.ZipEntry; | ||
| import java.util.zip.ZipFile; | ||
|
|
||
| public class PythonUdfManager { | ||
|
|
||
| private StreamTableEnvironment tableEnvironment; | ||
| private PythonUdfConfig pythonUdfConfig; | ||
|
|
||
| public PythonUdfManager(StreamTableEnvironment tableEnvironment, PythonUdfConfig pythonUdfConfig) { | ||
| this.tableEnvironment = tableEnvironment; | ||
| this.pythonUdfConfig = pythonUdfConfig; | ||
| } | ||
|
|
||
| public void registerPythonFunctions() throws IOException { | ||
|
|
||
| String inputFiles = pythonUdfConfig.getPythonFiles(); | ||
| String[] pythonFilesSource; | ||
| if (inputFiles != null) { | ||
| registerPythonConfig(); | ||
| pythonFilesSource = inputFiles.split(","); | ||
| } else { | ||
| throw new PythonFilesNotFoundException("Python files not found"); | ||
| } | ||
|
|
||
| for (String pythonFile : pythonFilesSource) { | ||
| if (pythonFile.contains(".zip")) { | ||
| ZipFile zf = new ZipFile(pythonFile); | ||
| for (Enumeration e = zf.entries(); e.hasMoreElements();) { | ||
| ZipEntry entry = (ZipEntry) e.nextElement(); | ||
| String name = entry.getName(); | ||
| if (name.endsWith(".py")) { | ||
| name = name.replace(".py", "").replace("/", "."); | ||
| String udfName = name.substring(name.lastIndexOf(".") + 1); | ||
| String query = "CREATE TEMPORARY FUNCTION " + udfName.toUpperCase() + " AS '" + name + "." + udfName + "' LANGUAGE PYTHON"; | ||
| tableEnvironment.executeSql(query); | ||
| } | ||
| } | ||
| } else if (pythonFile.contains(".py")) { | ||
| String name = pythonFile.substring(pythonFile.lastIndexOf('/') + 1).replace(".py", ""); | ||
| String query = "CREATE TEMPORARY FUNCTION " + name.toUpperCase() + " AS '" + name + "." + name + "' LANGUAGE PYTHON"; | ||
| tableEnvironment.executeSql(query); | ||
| } else { | ||
| throw new PythonFilesFormatException("Python files should be in .py or .zip format"); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private void registerPythonConfig() { | ||
| if (pythonUdfConfig.getPythonRequirements() != null) { | ||
| tableEnvironment.getConfig().getConfiguration().setString("python.requirements", pythonUdfConfig.getPythonRequirements()); | ||
| } | ||
| if (pythonUdfConfig.getPythonArchives() != null) { | ||
| tableEnvironment.getConfig().getConfiguration().setString("python.archives", pythonUdfConfig.getPythonArchives()); | ||
| } | ||
| tableEnvironment.getConfig().getConfiguration().setString("python.files", pythonUdfConfig.getPythonFiles()); | ||
| tableEnvironment.getConfig().getConfiguration().setInteger("python.fn-execution.arrow.batch.size", pythonUdfConfig.getPythonArrowBatchSize()); | ||
| tableEnvironment.getConfig().getConfiguration().setInteger("python.fn-execution.bundle.size", pythonUdfConfig.getPythonBundleSize()); | ||
| tableEnvironment.getConfig().getConfiguration().setLong("python.fn-execution.bundle.time", pythonUdfConfig.getPythonBundleTime()); | ||
| } | ||
| } |
62 changes: 62 additions & 0 deletions
62
dagger-functions/src/test/java/io/odpf/dagger/functions/udfs/python/PythonUdfConfigTest.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| package io.odpf.dagger.functions.udfs.python; | ||
|
|
||
| import io.odpf.dagger.common.configuration.Configuration; | ||
| import org.junit.Assert; | ||
| import org.junit.Before; | ||
| import org.junit.Test; | ||
| import org.mockito.Mock; | ||
|
|
||
| import static io.odpf.dagger.functions.common.Constants.*; | ||
| import static org.mockito.Mockito.when; | ||
| import static org.mockito.MockitoAnnotations.initMocks; | ||
|
|
||
| public class PythonUdfConfigTest { | ||
|
|
||
| @Mock | ||
| private Configuration configuration; | ||
|
|
||
| @Before | ||
| public void setup() { | ||
| initMocks(this); | ||
| } | ||
|
|
||
| @Test | ||
| public void shouldParseConfig() { | ||
| String pythonJsonConfig = "{ \"PYTHON_FILES\": \"/path/to/function.zip\", \"PYTHON_ARCHIVES\": \"/path/to/file.txt\", \"PYTHON_REQUIREMENTS\": \"requirements.txt\", \"PYTHON_FN_EXECUTION_ARROW_BATCH_SIZE\": \"10000\", \"PYTHON_FN_EXECUTION_BUNDLE_SIZE\": \"100000\", \"PYTHON_FN_EXECUTION_BUNDLE_TIME\": \"1000\" }"; | ||
|
|
||
| when(configuration.getString(PYTHON_UDF_CONFIG, "")).thenReturn(pythonJsonConfig); | ||
| PythonUdfConfig pythonUdfConfig = PythonUdfConfig.parse(configuration); | ||
|
|
||
| Assert.assertNotNull(pythonUdfConfig); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonFiles(), "/path/to/function.zip"); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonArchives(), "/path/to/file.txt"); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonRequirements(), "requirements.txt"); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonArrowBatchSize(), 10000); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonBundleSize(), 100000); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonBundleTime(), 1000); | ||
| } | ||
|
|
||
| @Test | ||
| public void shouldUseDefaultValueIfConfigIsNotGiven() { | ||
| String pythonJsonConfig = "{ \"PYTHON_FILES\": \"/path/to/function.zip\", \"PYTHON_ARCHIVES\": \"/path/to/file.txt\", \"PYTHON_REQUIREMENTS\": \"requirements.txt\" }"; | ||
|
|
||
| when(configuration.getString(PYTHON_UDF_CONFIG, "")).thenReturn(pythonJsonConfig); | ||
| PythonUdfConfig pythonUdfConfig = PythonUdfConfig.parse(configuration); | ||
|
|
||
| Assert.assertEquals(pythonUdfConfig.getPythonArrowBatchSize(), 10000); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonBundleSize(), 100000); | ||
| Assert.assertEquals(pythonUdfConfig.getPythonBundleTime(), 1000); | ||
| } | ||
|
|
||
| @Test | ||
| public void shouldReturnNullIfPythonFilesConfigIsNotGiven() { | ||
| String pythonJsonConfig = "{\"PYTHON_FN_EXECUTION_ARROW_BATCH_SIZE\": \"10000\", \"PYTHON_FN_EXECUTION_BUNDLE_SIZE\": \"100000\", \"PYTHON_FN_EXECUTION_BUNDLE_TIME\": \"1000\"}"; | ||
|
|
||
| when(configuration.getString(PYTHON_UDF_CONFIG, "")).thenReturn(pythonJsonConfig); | ||
| PythonUdfConfig pythonUdfConfig = PythonUdfConfig.parse(configuration); | ||
|
|
||
| Assert.assertNull(pythonUdfConfig.getPythonFiles()); | ||
| Assert.assertNull(pythonUdfConfig.getPythonArchives()); | ||
| Assert.assertNull(pythonUdfConfig.getPythonRequirements()); | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.