HUGE server update is now LIVE!
25th October 2016
We’ve just completed our biggest ever update to the Merops server
(the part of Merops which processes and edits documents). Our testing shows the
average document has more than 50
improvements compared to the previous server – including recognition
improvements, new features and enhancements, and bug fixes*.
You don’t need to do anything to start using this update.
What’s new?
- Brand new reference system! More accurate, and more powerful
- More consistent and accurate! Far fewer recognition errors in all parts of a document
- Data extraction! A brand-new way to use Merops, extract metadata in under 10 seconds
- XML improvements! Including support for JATS 1.1 XML
- Improved formatting handling! More comprehensive and intelligent recognition
- 125 brand new rules!
- 220 new settings for 68 existing rules!
- Processes documents faster! Finish XML is over twice as fast!
- So much more!
Detailed list of improvements
Brand new reference system
- More recognition
- Merops can now ‘change its mind’ based on results found online
- Reference can be sequenced in any order
- Merops can now apply the APA style of truncating name lists, using an ellipsis plus last author
- Fewer comments, more automated changes
More consistent and accurate
- Merops now recognizes specialist content much more consistently and accurately between Standard Sets, even when specialist modules are turned off
- Massively improved handling of non-English documents
- Dutch dictionary added, with over 225,000 words
- More accurate recognition of capitalized words in title case, e.g. to distinguish
between ‘Golf’ the car and ‘golf’ the sport, using subject detection
- Abbreviation definition suggestions now use subject and language detection to give more helpful suggestions
- Total terms in Merops is now over 3.5 million
Data extraction
Brand new type of process, Merops can now be set up to not
analyse a whole document, but instead extract only the metadata you require.
It can extract:
- All front matter data
- Document author names
- References
- Tables
This process is much faster than normal processing or Finish
XML - under 10 seconds per file.
XML improvements
- Support for JATS 1.1 XML
- Added granularity to tagging address lines
- Added tagging of:
- funding source including sponsor IDs
- quotation sources
- citations to headings
- <mml> attributes preserved
- Table column attributes for:
- column width
- character alignment
Formatting
Much more accurate preservation of emphasis, and author’s original formatting where required
New rules
- Remove ‘emphasis’ on punctuation, brackets, bridges, etc.
- Remove subtle/accidental format changes
- Remove highlighting
New Style Names
- Compatible with Merops 3
- More intuitive names
- More unique name starts, so names can be reached faster with keyboard shortcuts
- New reference style: ‘Location’ in a reference, to ensure accurate Finish XML
125 brand new rules
These are available using custom properties, please contact Shabash for help turning on any of these settings
References
- ‘Issue’
heading
- add
or remove links to CrossRef
- add
or remove links to PubMed
- add/remove
IDs from reference list
- alert
multiple references in one paragraph
- capitalization
of proceedings title
- capitalization
of report title
- capitalization
of URL intro
- conference
details sequence
- delete
duplicate page ranges
- delete
unnecessary content
- formatting:
punctuation before ‘In’
- formatting:
punctuation before article title
- formatting:
punctuation before journal name
- formatting:
punctuation before journal pages
- formatting:
punctuation before volume number
- include/remove
location on the end of journal names
- journal
page range in parentheses or not
- link
style: ISI
- link
style: PII
- link
style: Scopus
- link
style: WorldCat
- remove
PII links
- remove
WorldCat links
- show
editors of whole book in style of authors
- sort
numerical citations
- specific
sequence for details in book chapter reference
- specific
sequence for details in online reference
- standardize
language of words like ‘editors’ in references
- URL
accessed date in brackets or not
- volume
number convert between digits and roman numerals in book reference
- volume
number convert between digits and roman numerals in journal reference
Tables
- delete
empty columns
- horizontal
alignment in body cells, based on content of column
- horizontal
alignment in table head cells
- justification
of table body cells that span across multiple cells
- justification
of table head cells that span across multiple cells
- maximum
table width
- merge
empty cells in headings
- minimum
table width
- replace
‘word wrap’ paragraph return with line break
- split
rows across multiple paragraphs into rows
Miscellaneous
- add/remove hyperlink on emails
- add/remove hyperlink on URLs
- affiliations: Repeat address lines in
adjacent addresses
- competing interest: Wording of ‘none
declared’
- correct and standardize document IDs in
references
- correct graph line description punctuation
- correspondence: Heading case
- delete ID number from panel citations in
figure captions
- dynamic endnote symbol set
- dynamic footnote symbol set
- emails character style
- headings: Alert single section
- person’s role brackets in front matter
- punctuation before role in front matter
- quality report: ‘Fail’ label
- quality report: Pass threshold for Merops
confidence score
- quality report: Pass threshold for number
of inconsistent rules
- restrict editing scope to specific parts
of the document
- standardize chemical bond dash
- subject: Customizable subject
classification set
- subject: Detect subject
- subject: Generate subject paragraph in
front matter
- subject: Heading
- subject: Maximum subjects count
- subject: Minimum subjects count
- subject: Punctuation after heading
- subject: Punctuation between subjects
- URLs character style
- ‘data’
must be treated as plural (was previously a part of general plural corrections)
- abbreviation
of ‘April’
- add
missing supplementary material
- alert
heading ID jumps
- alert
missing copyright statement
- alert
missing spaces in general text
- alert
repeated figure captions
- alert
unofficial taxonomic names
- biography
heading
- character
styles: turn off abbreviations
- character
styles: turn off document objects and their citations
- character
styles: turn off heading IDs
- character
styles: turn off list IDs
- comment
author prefix in XML
- define/don’t
define TV (television)
- delete
the word ‘number’ in addresses, e.g. ‘No. 10 Spencer Road’
- drug
abbreviation case
- elision
options: shortest (123-5), or 2-digits (123-25)
- enable
unheaded book reviews
- geology
preference: MIS1 / MIS 1
- ignore
supplementary material citations out of sequence
- maximum
height for graphics
- maximum
width for graphics
- move
graphics inside margins
- move
II and III after names
- North-East
Asia/Northeast Asia/Northeastern Asia
- novelty
statement heading
- proper
nouns plural style: Thomas’s/Thomas’
- punctuation
after ‘Phone’ heading in correspondence info
- punctuation
before ‘etc.’
- qualifications
in name lists - add/remove brackets/parentheses
- remove
highlighting
- remove
unnecessary formatting from punctuation/spacing
- remove
unnecessary formatting from text
- require/delete
novelty statement
- retrieve
missing front matter content for legacy documents
- sort
degrees/footnotes/email address after names in author byline
- sort
header even if there is unmatched content
- spelling
preference: roentgen/röntgen
- spelling
preference: Sharia/Shari’ah/Shariah
- standardize
special characters
- standardize
title position in name lists
- unit
preference: IU/iu/IUs/ius/I.U./i.u./I.U.s/i.u.s
- use
plus-minus sign between mean and SD
220 new settings for 68 existing rules
These are available using custom properties, please
contact Shabash for help turning on any of these settings
- Correspondence:
Heading: CONTACT/Contact
- Correspondence:
Name/details sequence: name:¶details/name¶details/name.¶details/name;
details
- Correspondence: Postal code: required
- Correspondence:
Punctuation after heading: point then tab/colon then tab
- Correspondence:
Require address: yes if known
- Correspondence:
Require email: yes if known
- Correspondence:
Require fax number: yes if known
- Correspondence:
Require name: yes if known
- Correspondence:
Require phone number: yes if known
- Correspondence:
Telephone intro in correspondence info: none
- Crossref link style: http://dx.doi.org/
/ dx.doi.org/
- Displayed
equation ID style: 1)/1]
- Document
object citation format: small caps
- Document
object ID style: 1]/[1]/1)
- e.g.:
for example
- Ellipsis
brackets/spacing style in quotes: […]/[ … ]/(…)/( … )/…
- Ellipsis
character style: three points (...)
- Equation
citation style: EQS./Eqs./eqs.
- ‘et
al.’ in citations in parentheses: and others
- ‘et
al.’ in citations: and others
- ‘et
al.’ in running head: and others
- GenBank number intro: GenBank/accession
No./GenBank No./accession No.
- Glossary
heading bridge: colon then paragraph return
- Heading
IDs: dynamic
- i.e.:
that is
- Less
than spacing in standalone term (<2): following normal
mathematics rule
- Medicine:
stain style: name, 100x/name, magnification 100x/name, original
magnification 100x/name (100x)/name (magnification 100x)/name (original
magnification 100x)/name/100x/name/magnification 100x/name/original
magnification 100x/name 100x/name magnification 100x/name original
magnification 100x/name stain, 100x/name stain, magnification 100x/name stain,
original magnification 100x/name stain (100x)/name stain (magnification 100x)/name
stain (original magnification 100x)/name stain/100x/name stain/magnification
100x/name stain/original magnification 100x/name stain 100x/name stain
magnification 100x/name stain original magnification 100x
- ‘monoclonal
antibody’ abbreviation: MAB
- Person’s
role in author byline: required
- Present address: Address: add if known
- Present address: Email: add if known
- Present address: Fax: add if known
- Present
address: Name/details sequence: name¶details/name:¶details/name.¶details
- Present address: Name: add if known
- Present address: Phone: add if known
- Present address: Postal code: required
- Primary
language: Dutch
- PubMed link style: PubMed PMID:
[ID]/PubMed PMID:[ID]/PubMed PMID [ID]/PubMed ID: [ID]/PubMed ID:[ID]/PubMed ID
[ID]
- Punctuation
between keywords: spaced mid-dot
- Punctuation
between names and editors in reference: semicolon then space
- Reference
authors use et al.: use ellipsis plus last name instead
- Reference
sequence (book): can be any sequence
- Reference
sequence (journal): can be any sequence
- Reference
sequence (thesis): can be any sequence
- References:
‘accessed on’ style: first accessed/last accessed/retrieved/retrieved:/retrieved
on/retrieved on:/cited on:/cited on/cited:/cited/first date accessed/last date
accessed/date accessed
- References:
‘Edited by’ style: ed by/Ed by/edited by/Edited by/ed. by/Ed. by
- References:
‘edition’ style: [2nd ed]/[2nd Ed]/[2nd Edition]/[2nd edition]/[2nd
Edn]/[2nd edn]/2nd Edn./2nd edn./(2nd Edn.)/(2nd edn.)/2nd Ed./2nd ed./(2nd
Ed.)/(2nd ed.)/[Ed 2]/[ed 2]/[Edn 2]/[edn 2]/edn. 2/Edn. 2/(Edn. 2)/(edn 2)/ed.
2/Ed. 2/(Ed. 2)/(ed. 2)/French/[Second Edition]/[second edition]/Second Edn./second
edn./Second Edn./second edn./(Second Edn.)/(second edn.)/Second Ed./second ed./Second
Ed./second ed./(Second Ed.)/(second ed.)
- References:
‘et al.’ in editor list: and others
- References:
book volume format: small caps
- References:
dash style to represent repeated names: 3 em dashes
- References:
numeric citation style: spaced superscript
- References:
PubMed link style: PubMed:ID/Pubmed:ID/http://www.ncbi.nlm.nih.gov/pubmed//www.ncbi.nlm.nih.gov/pubmed//ncbi.nlm.nih.gov/pubmed//PubMed:ID/PMID:ID
- References: Translator description
style: translators/Translators/translated by/Translated by/Transl. by/Transl
by/transl. by/transl by/trans. by/Trans. by/trans by/Trans by/tr. by/tr by/Tr.
by/Tr by/Transl./Transl/transl./transl
- References:’et al.’ in author list: and others
- Require
date in front matter: yes
- ‘Senior’
contraction: Sen
- South East Asia: South East Asia /
Southeastern Asia
- Spacing
after displayed list ID: em space/en space
- Spelling
preferences: brussels sprout
- Spelling
preferences: cabbalah/Cabbalah/Kabala/kabala/qabala/Qabala
- Spelling
preferences: dahl/dholl
- Spelling
preferences: Dewali
- Spelling
preferences: gubbah
- Spelling
preferences: kebob/kabob
- Spelling
preferences: NA
- Spelling
preferences: peekaboo
- Tag
style for affiliations: none/[affiliation]/<affiliation>
- Tag
style for authors: none/<aut>
- Tag
style for title: none
Bugs fixed, and more bug resilient
- All 42 regressions found from previous update fixed
- Dozens
of new automatic system checks added, and used to fix and standardize over
13,500 errors, inefficiencies, and inconsistencies in Merops code, and
permanently prevent them from recurring
- Intra-document
linking now works
- Columns
split in table now works
Headings
Front/end matter
- Automatically generate keywords from document content
- Generate or add to correspondence info from the rest of document
- Much improved pairing of email addresses with authors
- Improved generation of missing address parts
- Auto apply national/international standard for phone number standardization, with customizable rule for +44 (0)1… vs +441…
New table system
- Standardize
horizontal alignment by character, to individual cells based on the content of
the column
- Left/right
align, justify, or centre content in table heads that span other cells
- Delete
empty columns
- Resize
tables
- Merge
cells in table headings
- Auto split rows
Speed
- Initial processing is 12% faster
- Finish XML is over 100% faster
- Server loads up after a reboot in under 5 minutes (3.8× faster)