Introducing a Massive New Multilingual Dataset for Machine Translation
A recent paper by researchers from several European universities unveiled the HPLT (High Performance Language Technologies) dataset for language modeling and machine translation. This dataset incorporates monolingual and bilingual corpora sourced from web crawls by the Internet Archive and CommonCrawl, a first at this scale.
According to the team, these resources are among the largest open text corpora ever released, providing invaluable material for language modeling and machine translation training.
The USPTO Wants to Make Things Clearer for Competing Inventors
On 5 May 2024, the USPTO proposes to amend the rules of practice to add a new requirement for an acceptable terminal disclaimer that is filed to obviate nonstatutory double patenting.
The proposed rule change would require terminal disclaimers filed to obviate nonstatutory double patenting to include an agreement by the disclaimant that the patent in which the terminal disclaimer is filed will be enforceable only if the patent is not tied to a patent by one or more terminal disclaimers involved.