Conversation
09b7fb9 to
4c56eb8
Compare
4c56eb8 to
8edd4ce
Compare
| def read | ||
| begin | ||
| @buffer << readline | ||
| @scanner.string = @scanner.rest + readline |
There was a problem hiding this comment.
Can we use @scanner << readline here?
There was a problem hiding this comment.
Changing to @scanner << readline causes the following error in JRuby.
https://github.com/ruby/rexml/actions/runs/7434514894/job/20228750659#step:4:43
Error: test_rexml(REXMLTests::TestIssuezillaParsing):
REXML::ParseException: No close tag for /issuezilla/issue[118]/activity[2]
Line: -1
Position: -1
Last 80 unconsumed characters:
/Users/naitoh/ghq/github.com/naitoh/rexml/lib/rexml/parsers/treeparser.rb:28:in `parse'
/Users/naitoh/ghq/github.com/naitoh/rexml/lib/rexml/document.rb:448:in `build'
/Users/naitoh/ghq/github.com/naitoh/rexml/lib/rexml/document.rb:101:in `initialize'
org/jruby/RubyClass.java:904:in `new'
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_rexml_issuezilla.rb:8:in `block in test_rexml'
5: include Helper::Fixture
6: def test_rexml
7: doc = File.open(fixture_path("ofbiz-issues-full-177.xml")) do |f|
=> 8: REXML::Document.new(f)
9: end
10: ctr = 1
11: doc.root.each_element('//issue') do |issue|
org/jruby/RubyIO.java:1179:in `open'
/Users/naitoh/ghq/github.com/naitoh/rexml/test/test_rexml_issuezilla.rb:7:in `test_rexml'
org/jruby/RubyKernel.java:1310:in `catch'
org/jruby/RubyKernel.java:1305:in `catch'
org/jruby/RubyKernel.java:1310:in `catch'
org/jruby/RubyKernel.java:1305:in `catch'
I am not sure why the error is occurring, but I am thinking that ruby/strscan#78 may be affected.
There was a problem hiding this comment.
It seems that the issue shows that scanner.string = scanner.rest + XXX has a problem but scanner << XXX doesn't have a problem. So it may not be related...
There was a problem hiding this comment.
ruby/strscan#78 has been fixed. Could you try with the latest strscan?
There was a problem hiding this comment.
I tried, but it did not fix it...
There was a problem hiding this comment.
@naitoh Thank you! I will try to fix it today.
There was a problem hiding this comment.
ruby/strscan#83 is fixed and will be released soon!
There was a problem hiding this comment.
@kou
ruby/strscan#84 has been merged into master and I confirmed that JRuby's @scanner << readline works.
lib/rexml/parsers/baseparser.rb
Outdated
| match = @source.match( ENTITYDECL, true ).to_a.compact | ||
| match[0] = :entitydecl | ||
| match = @source.match( ENTITYDECL, true ) | ||
| match = match.nil? ? [:entitydecl] : [:entitydecl, *match.captures.compact.reject(&:empty?)] |
There was a problem hiding this comment.
Hmm. This assumes that match is StringScanner.
How about returning @scanner.captures instead of @scanner by @source.match?
There was a problem hiding this comment.
I believe that ruby/strscan#72 needs to be merged in order to use @scanner.captures.
There was a problem hiding this comment.
Changed source#match? to return scanner.captures instead of @scanner.
There was a problem hiding this comment.
Add compact option to @source.match with 8bc8955
Improve processing speed by returning @scanner.captures.compact if @compact=true and @scanner if compact=false.
There was a problem hiding this comment.
Added match? and removed compact option in 50b3057.
There was a problem hiding this comment.
Removed Source#match? and return @scanner in Source#match.
8edd4ce to
fcc4db8
Compare
|
@kou I used I don't think this is a good idea... https://github.com/ruby/rexml/actions/runs/7510468370/job/20448939679?pr=105 |
fcc4db8 to
8227cc2
Compare
|
Add compact option to Improve processing speed by returning https://github.com/ruby/rexml/actions/runs/7512802060/job/20453872698?pr=105 |
| def read | ||
| begin | ||
| @buffer << readline | ||
| @scanner.string = @scanner.rest + readline |
There was a problem hiding this comment.
It seems that the issue shows that scanner.string = scanner.rest + XXX has a problem but scanner << XXX doesn't have a problem. So it may not be related...
It seems that calling How about providing We'll use diff --git a/lib/rexml/parsers/baseparser.rb b/lib/rexml/parsers/baseparser.rb
index 305b120..610209e 100644
--- a/lib/rexml/parsers/baseparser.rb
+++ b/lib/rexml/parsers/baseparser.rb
@@ -223,13 +223,13 @@ module REXML
return process_instruction
when DOCTYPE_START
base_error_message = "Malformed DOCTYPE"
- @source.match(DOCTYPE_START, true)
+ @source.match?(DOCTYPE_START, true)
@nsstack.unshift(curr_ns=Set.new)
name = parse_name(base_error_message)
- if @source.match(/\A\s*\[/um, true)
+ if @source.match?(/\A\s*\[/um, true)
id = [nil, nil, nil]
@document_status = :in_doctype
- elsif @source.match(/\A\s*>/um, true)
+ elsif @source.match?(/\A\s*>/um, true)
id = [nil, nil, nil]
@document_status = :after_doctype
else |
Hmm. https://github.com/ruby/rexml/pull/105/files#r1451610001 may fix this. |
8bc8955 to
50b3057
Compare
It was not fixed... |
|
Added https://github.com/ruby/rexml/actions/runs/7516658356/job/20461969215?pr=105 |
|
OK. It seems that we don't access all captured results in our use case. |
|
ruby/ruby#9536 will fix the CI failure. |
50b3057 to
ec62e37
Compare
I removed https://github.com/ruby/rexml/actions/runs/7519306454/job/20467689597?pr=105
|
lib/rexml/parsers/baseparser.rb
Outdated
| match = @source.match( ENTITYDECL, true ).to_a.compact | ||
| match[0] = :entitydecl | ||
| match = @source.match( ENTITYDECL, true ) | ||
| match = match.nil? ? [:entitydecl] : [:entitydecl, *match.captures.compact] |
There was a problem hiding this comment.
Do we need match.nil? check here?
(Is there any case that the above @source.match() failed?)
There was a problem hiding this comment.
If the match.nil? check is removed, all tests succeed.
But, if the string <!ENTITY> comes in, @source.match() responds with nil and undefined method ``captures' for nil is raised.
However, since <!ENTITY> violates the XML specification and should be treated as an error.
I removed the match.nil? check.
https://xml.coverpages.org/xmlBNF.html
EntityDecl ::= '<!ENTITY' S Name S EntityDef S? '>' /* General entities */
| '<!ENTITY' S '%' S Name S EntityDef S? '>' /* Parameter entities */
There was a problem hiding this comment.
OK. Could you add a test for the case?
| def read | ||
| begin | ||
| @buffer << readline | ||
| @scanner.string = @scanner.rest + readline |
There was a problem hiding this comment.
ruby/strscan#78 has been fixed. Could you try with the latest strscan?
[Why] Using StringScanner reduces the string copying process and speeds up the process.
ec62e37 to
995d3e2
Compare
995d3e2 to
ba9f7fc
Compare
…ve processing speed.
ba9f7fc to
eeb45e1
Compare
kou
left a comment
There was a problem hiding this comment.
+1
Could you update the PR description before we merge this?
|
@kou |
|
Thanks! |
|
Thanks for your review!!! |
Using StringScanner reduces the string copying process and speeds up the process.
And I removed unnecessary methods.
https://github.com/ruby/rexml/actions/runs/7549990000/job/20554906140?pr=105