Canadian ‘go-to’ movie linguist creates new language for ‘Alpha’
“Since no one really knows what was spoken 20,000 years ago, Schreyer could have chosen anything. But she was determined to try to create something authentic and turned to three inter-related ‘proto-languages,’ which are languages linguists theorize are the distant forerunners of later languages.”
Hello everyone. It’s hard to believe that another year has passed by and with it many changes. If 2016 can be seen as a trend, there is no doubt that 2017 will bring us many more as the entire country faces a change in governance and leadership. Here in our state, one that is rich in language and cultural diversity, perhaps now more than ever, it is important we all recognize that change is the only constant. What we do, how we do it and why are important. It is our purpose and perhaps our destiny to leave a positive and indelible mark on those we help.
I am honored and excited to serve you, the membership and the Board as well as the Arizona community at large, as President of Arizona Translator & Interpreters, Inc. I wanted to start out by mentioning our mission, but to do that perhaps, we need to take a concrete look at what our mission is and ensure it is well defined so that that we can measure all our activities, allocate our resources and assign our priorities efficiently against that yardstick.
As a volunteer, non-profit organization, ATI will seek and implement new ways to strengthen existing programs while introducing some new meaningful initiatives. Considering the accomplishments of the outgoing President and officers, we have some big shoes to fill but will have ample opportunity to succeed and I’m certain we will!
I know each of you understands the need for ATI to continue to deepen its footprint within the state of Arizona and the nation. As members of this organization, we embrace our motto; “We Teach, We Share, We Learn”. Simple words yes, but they have meaning to those of us dedicated to the pursuit of accurate communication at all levels. Our members represent a varied demographic supporting their communities both large and small. From rural to suburban to densely urban you provide support in all areas including; government, education, healthcare and the private sector. The amazing work you do and support you all provide is awe inspiring! Diversity, culture and language are our strengths as we work towards achieving our mission of advancing and supporting interpreting and translation within Arizona and beyond.
Although I have ideas of what I would like to see happen this year, I look to you, our members, our teachers, our mentors, to suggest and help shape the initiatives of where and how we can improve the value provided by ATI. Some things to think about for the coming year:
– Increase the amount of information available to our membership through increased focus on; industry changes and trends, the impact of new technologies, government and regulatory trends in healthcare and other programs, and international opportunities.
– Further engage the private sector and startup communities to see how we can work together towards common goals to improve the lives of everyone through better communication and conversation.
– Increase ATI’s presence with other user groups and strengthen our ties with other similar organizations and;
– Expand education and training opportunities for interpreters and translators to strengthen and specialize their skills.
So, drop me a line! Tell me what you think. You can always contact me by email; email@example.com
As a former CEO, Director and COO for three organizations within our field I’ve worked with some of the largest customers and I recognize that what you do is unique and valuable. ATI is your voice and we are committed to the premise that language services must be supported. Professional and qualified interpreters and translators are the tool and the solution to resolve miscommunication.
2016 has just entered the history books. Looking back at the accomplishments, I can say “What a year that was!” But looking ahead to 2017, I truly believe; you haven’t seen anything yet!
Thank you again for the honor of serving this great institution and I look forward to working together.
For the diverse universe of digital content generated by big data to be useful, it requires transformation for different channels (such as web, mobile, and print), conversion for various applications, and localization for other markets. This is an area of opportunity for translators and interpreters.
Go to any conference and you’ll find a few new additions to the usual buzzword bingo of industry jargon—“big data” and numbers with lots of zeroes. You’ll hear about the massive growth in digitized data, how often a given sector’s knowledge base doubles, and what companies are doing to manage and interpret that flood of data. This burgeoning trove of bytes includes structured databases, application code, images, videos, and text. You’ll also hear about machine learning and how big data contributes to making software more responsive and useful to customers’ needs.
Just how much data are we talking about? Already huge, the digital universe of content, code, and structured data grows by a mind-blowing amount every 24 hours. Each day the world creates another 2.5 quintillion bytes of data.1 This data comes from many sources, including documents, social media posts, electronic purchase transaction records, and cellphone GPS signals. That daily infusion is estimated to pump the global repository of information from the 7.9 zettabytes (7.9 x 1021 bytes) available in 2015 to 176 zettabytes by 2025.2 Keep in mind that 1 zettabyte equals 1,000,000,000,000,000,000,000 bytes—an incomprehensible number.3 And that total doesn’t include the inestimable amount of content that is spoken every day.
Whatever the content being created, this truly immense volume includes massive, unrealized potential for translation or localization. So, what does this mean for the language industry, both humans and machines?
What Is Big Data and Why Does It Matter?
When we talk about big data, we refer to new ways of taking large amounts of data and using software tools to identify previously undiscovered patterns, trends, correlations, and associations. If you’ve ever bought a book because an online retailer told you that customers with viewing histories like yours have enjoyed it, you’ve been the beneficiary of big-data analytics.
This practice became possible because of the digitization of business, government, and everyday life over the past few decades. This information is stored in massive databases of structured data and repositories of documents large and small. We feed this growing beast with more bits and bytes every day. While all organizations rely on data to run their operations, a small but growing number use it to better understand behaviors, preferences, and trends in their world. Then, using those insights, organizations can make better decisions about how they market their wares, help their customers, improve operational efficiency, or build the next great thing.4
How do they do it? It’s not easy given the diversity of structured data and text. For highly structured data, software specialized to deal with big data draws from very large databases, often distributed around a network. Then, analysts employ a new generation of business intelligence and textual analysis tools to turn this raw data into usable information and actionable insights.5 They may combine transaction data with server logs, clickstream data, social media content, and customer e-mail texts, sensor data, and phone records to extract insights. They also extract insights using advanced analytical tools, including statistical analysis, data and content mining, predictive analysis, and text analytics. Traditional business intelligence and modern data visualization software help analysts present their findings in human-readable formats.
The language industry was actually one of the first areas of interest for big data applications. One of the early mainstream applications was in the statistical machine translation (SMT) efforts of Google and Microsoft. A 2011 Common Sense Advisory (CSA) report on MT trends characterized these statistics-based approaches to MT as big-data applications because they leverage large repositories of bilingual content. For example, they compare source documents in English to their human-translated Russian variants.6
In simplistic terms, SMT translates by comparing the zeroes and ones of the source file with the translation to find correlations and patterns. In other words, massive processing power allows computers to disassemble texts and their translations, analyze the patterns, and predict translations for texts they have never seen before. Such analytics has increased the speed of language support over earlier MT solutions that relied on teams of linguists to create grammars, code them as rules, create dictionaries of bilingual translations, and then constantly modify or add to the rules as they found exceptions.
The 2011 CSA report predicted that experts would apply these mathematics-based big-data algorithms to crack inter-language communication and marketing issues as they processed more languages and a huge volume of multilingual content. And that, in fact, is what has happened.
Over the past several years, MT based on big-data analysis has drawn far more usage than the first-generation rule-based solutions. Google Translate draws massive numbers of users, which is a testament to its easy access and perceived, if not actual, improvement of the quality of MT output. Although academic research shows improvements using popular quality assessment systems such as BLEU7(bilingual evaluation understudy), these changes are not cumulative and results vary widely between languages and translatable content types (e.g., regular text, audio, video, and social media). Thus, data on quality improvement is anecdotal and may be balanced by lowered user expectations for quality.
The availability of cloud-based computing with unlimited horsepower from the likes of Amazon Web Services and Microsoft Azure supports these big-data practices. This kind of harvest and analysis will continue to grow into the “Internet of Things” as many billions of devices come online (e.g., sensors, embedded controllers, wearables, health checkers, and widgets not yet invented).
To be useful, much of this content requires transformation for different channels (such as web, mobile, and print), conversion for various applications, and localization for other markets.8 Corporate and government planners already know it’s not enough to have all that digitized information available in just a single language. Their mission is to use as much data as possible to support customer experiences for the populations that really matter to them. Otherwise, it will be impossible to engage and retain international or domestic multicultural audiences.
Just consider the requirements necessary to translate that information into other languages to make it available to a broader audience. It’s estimated that it takes 14 languages to reach 90% of the world’s most economically active populations, but most websites max out with support for just six languages or locales.9 Product and document localization at many companies lags even further behind. Spoken-language interpreting is even more limited.
As the volume of data organizations produce grows, so too will ambitions to reach a greater audience for goods and services. Client-side respondents to a recent CSA survey reported that they plan to increase translation volume by 67% over the next three years, from an average of 590 to 990 million words per year.10 This increase is one that the language industry cannot meet with current methods, and that buyers in the CSA survey sample expect to address with a combination of post-edited content from their suppliers and raw MT.
Where Big Data Fits Today—and in the Future
Organizations are starting to realize that their plans for more translation could very well exhaust the capacity of all current translators, as well as those who will enter the field in the foreseeable future.11
To help keep up with the demand, many organizations are employing both productivity enhancements for human translators and MT to overcome the challenges associated with volume, turnaround time, the need to deal with more target languages, and flat budgets. Companies invest in human translation and post-edited MT for essential business content, such as product and marketing materials that are reasonably stable. For example, translation buyers rely on a large and growing cohort of providers that employ MT to pre-process the source material and then edit the output with human linguists. A small percentage of client-side organizations also use unedited MT output for business content, such as FAQs and knowledge bases.
Besides translating a limited set of business-oriented text, some buyers have increased the use of MT to process user-generated content, such as product evaluations, hotel reviews, and forum discussions that few organizations have bothered to translate in the past. But as research conducted by CSA indicates, online consumers and business buyers alike would prefer to have user reviews translated, even if these reviews are all that gets translated.12
Figure 1: Hypothetical Correlation of Translation to Daily Content Creation
(note: “daily spend” = daily spending on language services)
Source: Common Sense Advisory, Inc.
Why the Volume of Big Data Concerns Translation Buyers and Suppliers
Big data represents enormous numbers, and it turns out that one day in the translation industry barely puts a dent in its volume. Let’s focus on just the written word and how it relates to that 2.5 quintillion bytes of data being generated every day.
Despite today’s objective of making humans more productive to save time and money, the world is far from the nirvana of having enough online content available in all languages. From years of research and consulting, we know that any discussion about whether or not to invest in translation, localization, and interpreting has to begin with a review of available data.
CSA decided to investigate the enormous challenges facing the localization industry in terms of translating what should be translated from the totality of all data that could be translated. We decided to start with a given day’s output of digital content and determine what could actually be translated if we had the entire language industry working on just that content and none of the backlog of existing data.
What is this data? It’s everything that is digitally created every day, from documents to SQL data and telemetry to digital multimedia. For this hypothetical exercise, we began with the expenditure for outsourced services. It’s estimated that translation in various forms—human, post-edited, transcreation, plus website globalization and text-centric localization—accounts for US$26.4 billion of the $38.1 billion market for language services and technology.13
We then calculated the daily amount spent by the word. We divided US$26.4 billion by 365 days and estimated that the translation sector is worth US$72 million per day. At a hypothetical rate of 20 cents per word, we estimated that professional translators would process nearly 362 million words every day. We then converted that to bytes at the rate of 9.71 characters per word, which equates to seven billion bytes of double-byte characters. (Note that some languages have fewer characters per word on average and others have more).14
Finally, we compared it with the daily volume of content creation. When we divide the 2.5 quintillion bytes by the amount of target-language content produced by language services providers, we estimate that translation firms could potentially process just 0.00000000009% of the content created every day. However, we can safely assume that much of that data will never be translated—either the material isn’t translatable or translating it doesn’t make sense.
But some of what isn’t translated today (e.g., user reviews and social media posts) is on the future agenda of enterprise translation buyers as they strive to improve the customer experience. Even if we exclude all but an infinitesimal percentage of those daily bytes, the amount of content outsourced for translation is far less than 1% of what’s created every day. And remember that we are talking about the shortfall in translation for just one day. That number doesn’t address the backlog of content not yet translated.
As the results from this hypothetical exercise indicate, if the content is translated at all, it’s typically into just six languages online (and often fewer elsewhere). This is far short of the total number of online languages that really matter for both global and domestic communication and commerce.
Of course, there are many other variables and mitigating factors that affect these calculations. For example, consider in-house translation, languages for which you should translate but don’t, and the many zettabytes of existing content. The bottom line is that there’s an enormous amount of content that will never be translated or localized. That means opportunity for the language sector, and not just the technology companies.
What Big Data Means to the Language Sector
The big data and translation needs we discussed represent an opportunity for the language sector, but many translators look at the situation and worry that widespread deployment of MT will take work away from them. Our research estimates that translators will, in fact, lose some lower value jobs to MT, but that the total amount of work they have will increase at a steady rate for the foreseeable future.
If we also consider the expansion in post-editing—a contentious topic to be sure—we see that reliance on human professionals will grow faster than the current pipeline of future translators can add capacity. As a result, translators and interpreters will require productivity benefits from big data if they are to keep pace with demand. A few will take a much bigger step and become specialists who can build, train, and improve MT engines.
On the productivity front, we see that big data today trains statistics-based MT engines and could be used to supplement the post-editing processes of other MT models. Connections to MT are available in CAT tools such as Kilgray memoQ, Memsource Cloud, and SDL Trados Studio. Meanwhile, startups like Lilt use MT output in a CAT-like tool to accelerate human translation. We have also been briefed by software developers who are evaluating big-data machine learning techniques to improve terminology, translation memory, disambiguation, and a variety of other content creation, localization, and reviewing tasks. In short, big data will underpin most of the software tools translators use. Interpreters will also benefit as MT technology evolves for spoken languages.
What does big data mean for professional linguists? Just as they saw with translation memory and terminology management, linguists will have another tool at their disposal. Employers on both the end-buyer and agency sides will expect them to use this software to speed up their work and improve the usefulness of the output because of improved analysis of the source content.
Our 2016 survey of language services providers found that 49% of respondents have already committed to post-editing MT as a service.15 As early as 2012, our research showed that 21% of freelancers had experience using the technology.16
Some will move away from the classic translation agency structure to become big-data specialists. They will create clusters of industry- and domain-specific memories and harvest, analyze, and translate content. Content curation positions in which language professionals work with data applications to integrate relevant results to “enrich” them with useful metadata (e.g., topic categorization, classification of names and entities) are just now emerging.17 These positions will allow localizers to add market-specific value to content. Some will take the next step into the global marketing mainstream, adding to their portfolio services such as transnational business intelligence to help companies better understand their markets, or cross-language semantic and sentiment analysis to cull the opinions of consumers and business buyers out of multilingual content.
Big data has increased the volume of content dramatically. At the same time, automated content enrichment and analytical tools based on big-data science will enable the training of more sophisticated tools to help humans translate the growing volume of content and enable machines to close the yawning gap between what’s generated and what’s actually translated. No doubt some linguists will view these big-data-based innovations as threats. Others will view such advances as opportunities that will help them enhance the meaning of the source content, increase the usefulness of the other tools they employ, and increase their productivity in the process.
Although it has not happened yet, we speculate that MT driven by these phenomena could remove the “cloak of invisibility” from translators, giving them greater recognition and status.18 Even if machines generated the lion’s share of translation and humans did a smaller percentage, the sheer absolute volume of human translation would increase for high-value sectors such as life sciences, other precise sectors, and belles lettres. In turn, the perceived value of human translation could increase. Why? Because when you bring in a live human, it means the transaction is very, very important. It’s not so different from accounting. Software can handle routine tasks, but when problems arise or something is critical, you bring in a high-paid accountant to deal with it.
As interlingual communication becomes transparent, we predict that the number of situations where high-value transactions occur—i.e., those requiring human translators and interpreters—will go up, not down. If provider rates increase and companies use MT to address a larger percentage of their linguistic needs, human translators could benefit as they’re paid well to render the most critical content supporting the customer experience and other high-value interactions.
Comments Off on Mid-Year Strategic Planning Meeting
The board had its Mid-Year Strategic Planning meeting on August 13, 2016 at Claim Jumper to review all the progress achieved in 2016. We also discussed all the exciting presentations for our annual conference on October 1, 2016, at Maricopa Health Integrated System in Phoenix. We are very excited about the proposals we have received and look forward to seeing leaders in the industry presenting this year! Stay tuned!
“In the Heart of Communication…
Beyond Linguistic Barriers – A United World”
All arrows point to Tucson, AZ for this year’s Arizona Translators and Interpreters yearly conference! This is absolutely exciting; for it is the first year our conference will be held outside of the Phoenix metro area. The Arizona Translators & Interpreters Inc. (ATI) Annual Conference will be held in Tucson on Saturday, October 12, 2013 at the Pima Community College Downtown Campus, located at 1255 North Stone Avenue, Tucson AZ 85709.
At Arizona Translators and Interpreters we teach, we share, we learn; and what better way to do all this than joining the many other central and southern Arizona’s interpreters and translators at our annual conference.
Some of the benefits are:
Top level professional development from experts in the industry
Meeting and greeting the speakers and networking with our sponsors as well as colleagues.
Finding where you fit best in these ever growing industries.
And… enjoying the dining activities planned for the conference.