Skip to content

Allow grobid_client to be called without configuration, add some unit tests, proper logging#88

Merged
lfoppiano merged 10 commits intomasterfrom
feature/remove-config_json-obligation
Aug 24, 2025
Merged

Allow grobid_client to be called without configuration, add some unit tests, proper logging#88
lfoppiano merged 10 commits intomasterfrom
feature/remove-config_json-obligation

Conversation

@lfoppiano
Copy link
Copy Markdown
Member

If the config.json is not available, the client will call grobid with the default values.

@lfoppiano lfoppiano requested a review from Copilot July 20, 2025 18:48

This comment was marked as outdated.

@lfoppiano lfoppiano requested a review from Copilot July 20, 2025 18:56

This comment was marked as outdated.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lfoppiano lfoppiano requested a review from Copilot July 20, 2025 20:09

This comment was marked as outdated.

@lfoppiano lfoppiano requested a review from Copilot August 24, 2025 10:36

This comment was marked as outdated.

@lfoppiano lfoppiano requested a review from Copilot August 24, 2025 11:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the grobid_client to make configuration files optional and enhances the client with comprehensive logging capabilities. The client will now use default values when config.json is not available, allowing it to run without configuration.

  • Enhanced configuration loading to gracefully handle missing config files with fallback to defaults
  • Added comprehensive logging infrastructure with configurable levels, formats, and file/console output
  • Improved error handling throughout the codebase with proper exception management and logging

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
grobid_client/grobid_client.py Major refactoring with logging infrastructure, optional config loading, enhanced error handling, and improved processing feedback
config.json Updated default configuration with new logging settings and adjusted parameters
Readme.md Documentation updates reflecting optional config and Windows support clarifications

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +184 to +204
# Use print for initial loading since logger isn't configured yet
print(f"Loading configuration file from {path}")
with open(path, 'r') as config_file:
config_json = config_file.read()
# Update the default config with values from the file
file_config = json.loads(config_json)
self.config.update(file_config)
print("Configuration file loaded successfully")
except FileNotFoundError as e:
# If config file doesn't exist, keep using default values
error_msg = f"The specified config file {path} was not found"
print(f"ERROR: {error_msg}")
raise FileNotFoundError(error_msg) from e
except json.JSONDecodeError as e:
# If config exists, but it's invalid, we raise an exception
error_msg = f"Could not parse config file at {path}: {str(e)}"
print(f"ERROR: {error_msg}")
raise json.JSONDecodeError(error_msg, e.doc, e.pos) from e
except Exception as e:
error_msg = f"Error reading config file at {path}: {str(e)}"
print(f"ERROR: {error_msg}")
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using print statements during configuration loading is inconsistent with the logging infrastructure being set up. Consider using a temporary logger or deferring these messages until after logging is configured.

Suggested change
# Use print for initial loading since logger isn't configured yet
print(f"Loading configuration file from {path}")
with open(path, 'r') as config_file:
config_json = config_file.read()
# Update the default config with values from the file
file_config = json.loads(config_json)
self.config.update(file_config)
print("Configuration file loaded successfully")
except FileNotFoundError as e:
# If config file doesn't exist, keep using default values
error_msg = f"The specified config file {path} was not found"
print(f"ERROR: {error_msg}")
raise FileNotFoundError(error_msg) from e
except json.JSONDecodeError as e:
# If config exists, but it's invalid, we raise an exception
error_msg = f"Could not parse config file at {path}: {str(e)}"
print(f"ERROR: {error_msg}")
raise json.JSONDecodeError(error_msg, e.doc, e.pos) from e
except Exception as e:
error_msg = f"Error reading config file at {path}: {str(e)}"
print(f"ERROR: {error_msg}")
# Ensure logging is configured at least minimally
if not logging.getLogger().hasHandlers():
logging.basicConfig(level=logging.INFO)
logger.info(f"Loading configuration file from {path}")
with open(path, 'r') as config_file:
config_json = config_file.read()
# Update the default config with values from the file
file_config = json.loads(config_json)
self.config.update(file_config)
logger.info("Configuration file loaded successfully")
except FileNotFoundError as e:
# If config file doesn't exist, keep using default values
error_msg = f"The specified config file {path} was not found"
logger.error(error_msg)
raise FileNotFoundError(error_msg) from e
except json.JSONDecodeError as e:
# If config exists, but it's invalid, we raise an exception
error_msg = f"Could not parse config file at {path}: {str(e)}"
logger.error(error_msg)
raise json.JSONDecodeError(error_msg, e.doc, e.pos) from e
except Exception as e:
error_msg = f"Error reading config file at {path}: {str(e)}"
logger.error(error_msg)

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +196
error_msg = f"The specified config file {path} was not found"
print(f"ERROR: {error_msg}")
raise FileNotFoundError(error_msg) from e
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts the PR's purpose of allowing the client to work without configuration. When config_path is provided but the file doesn't exist, it should raise an exception, but the PR description suggests the client should work without config.json entirely.

Suggested change
error_msg = f"The specified config file {path} was not found"
print(f"ERROR: {error_msg}")
raise FileNotFoundError(error_msg) from e
warning_msg = f"WARNING: The specified config file {path} was not found. Using default configuration values."
print(warning_msg)
# Do not raise, just continue with defaults

Copilot uses AI. Check for mistakes.
Comment on lines +521 to +523
flavor,
start,
end,
flavor
end
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter order in the recursive call has been changed - 'flavor' is now before 'start' and 'end', but the method signature has 'flavor' as the last parameter. This will cause incorrect parameter passing.

Copilot uses AI. Check for mistakes.

This comment was marked as outdated.

@lfoppiano lfoppiano requested a review from Copilot August 24, 2025 12:43

This comment was marked as outdated.

@lfoppiano lfoppiano requested a review from Copilot August 24, 2025 14:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a significant enhancement to the GROBID client by making configuration files optional. The client can now operate with default settings when config.json is not available, improving usability and reducing setup complexity. The changes include comprehensive test coverage, improved logging functionality, and enhanced error handling.

  • Configuration file is now optional - client uses sensible defaults when no config is provided
  • Enhanced logging system with configurable levels and output destinations
  • Comprehensive test suite with both unit and integration tests covering the new functionality

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
grobid_client/grobid_client.py Core implementation of optional config loading, enhanced logging, and improved error handling
tests/ Complete test suite with unit tests, integration tests, and configuration fixtures
config.json Updated default configuration with expanded coordinates and logging settings
Readme.md Documentation updates reflecting the optional config feature
.github/workflows/ci-build.yml CI pipeline updates to include test execution and coverage reporting
pytest.ini Test configuration and markers setup

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +46 to +47
# Use shutil.rmtree for better cleanup of non-empty directories
import shutil
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import statements should be at the top of the file rather than inside methods. Move this import to the top with other imports.

Suggested change
# Use shutil.rmtree for better cleanup of non-empty directories
import shutil

Copilot uses AI. Check for mistakes.
@lfoppiano lfoppiano changed the title Allow grobid_client to be called without configuration Allow grobid_client to be called without configuration, add some unit tests Aug 24, 2025
@lfoppiano lfoppiano changed the title Allow grobid_client to be called without configuration, add some unit tests Allow grobid_client to be called without configuration, add some unit tests, proper logging Aug 24, 2025
@lfoppiano lfoppiano merged commit a08736e into master Aug 24, 2025
6 checks passed
@lfoppiano lfoppiano deleted the feature/remove-config_json-obligation branch August 24, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants