Snapshot Package¶
Snapshot¶
- class factiva.news.snapshot.Snapshot(user_key=None, user_stats=False, query=None, snapshot_id=None)¶
Bases:
BulkNewsBase
Represent a Factiva Snapshot Class.
Class that represents a Factiva Snapshot.
- Parameters
user_key (str or UserKey) – String containing the 32-character long APi Key. If not provided, the constructor will try to obtain its value from the FACTIVA_USERKEY environment variable.
user_stats (boolean, optional (Default: False)) – Indicates if user data has to be pulled from the server. This operation fills account detail properties along with maximum, used and remaining values. It may take several seconds to complete.
query (str or SnapshotQuery, optional) – Query used to run any of the Snapshot-related operations. If a str is provided, a simple query with a where clause is created. If other query fields are required, either provide the SnapshotQuery object at creation, or set the appropriate object values after creation. This parameter is not compatible with snapshot_id.
snapshot_id (str, optional) – String containing the 10-character long Snapshot ID. This parameter is not compatible with query.
See also
Stream
Class that represents the continuous Factiva News document stream.
Examples
- Creating a new Snapshot with an key string and a Where statement. Then, running a full Explain process.
>>> from factiva.news.snapshot import Snapshot >>> my_key = "abcd1234abcd1234abcd1234abcd1234" >>> my_query = "publication_datetime >= '2020-01-01 00:00:00' AND LOWER(language_code) = 'en'" >>> my_snapshot = Snapshot(user_key=my_key, query=my_query) >>> my_snapshot.process_explain() 106535
- Creating a new Snapshot from an UserKey and a SnapshotQuery instances. Then, running a full Analytics process.
>>> my_user = UserKey() >>> my_query = SnapshotQuery("publication_datetime >= '2020-01-01 00:00:00' AND LOWER(language_code) = 'en'") >>> my_query.frequency = 'YEAR' >>> my_query.group_by_source_code = True >>> my_query.top = 20 >>> my_snapshot = Snapshot(user_key=my_user, query=my_query) >>> analytics_df = my_snapshot.process_analytics() >>> analytics_df.head() count publication_datetime source_code 0 20921 1995 NGCIOS 1 20371 1995 LATAM 2 18303 1995 REUTES 3 10593 1995 EXPNSI 4 4212 1995 MUNDO
- download_extraction_files(download_path=None)¶
Download the list of files listed in the Snapshot.last_extraction_job.files.
Downloads the list of files listed in the Snapshot.last_extraction_job.files property, and stores them in a folder indicated by download_path. If no download_path is provided, then files are stored in a folder with the same name as the snapshot ID.
- Parameters
download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.
- Returns
Boolean
- Return type
True if the files were correctly downloaded, False otherwise.
- download_update_files(download_path=None)¶
Download the list of files listed in the Snapshot.last_update_job.files property.
Download the list of files listed in the Snapshot.last_update_job.files property, and stores them in a folder indicated by download_path. If no download_path is provided, then files are stored in a folder with the same name as the update ID.
- Parameters
download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.
- Raises
- RuntimeError when an update job has not been submitted. –
- Returns
Boolean
- Return type
True if the files were correctly downloaded, False otherwise.
- file_format = ''¶
- file_list = []¶
- folder_path = ''¶
- get_analytics_job_results()¶
Obtain the Analytics job results from the Factiva Snapshots API.
Obtains the Analytics job results from the Factiva Snapshots API. Results are stored in the last_analytics_job class property.
- Returns
Boolean – otherwise.
- Return type
True if the data was retrieved successfully. An Exception
- get_explain_job_results()¶
Obtain the Explain job results from the Factiva Snapshots API.
Obtains the Explain job results from the Factiva Snapshots API. Results are stored in the last_explain_job class property.
- Returns
Boolean – otherwise.
- Return type
True if the data was retrieved successfully. An Exception
- get_explain_job_samples(num_samples=10)¶
Obtain the Explain job samples from the Factiva Snapshots API.
Returns a list of up to 100 sample documents (no full-text) which includes title and metadata fields.
- Parameters
num_samples (int, optional (Default: 10)) – Number of sample documents to get explained by a job
- Returns
Boolean – otherwise.
- Return type
True if the data was retrieved successfully. An Exception
- get_extraction_job_results()¶
Obtain the Extraction job results from the Factiva Snapshots API.
Obtains the Extraction job results from the Factiva Snapshots API. Results are stored in the last_extraction_job class property.
- Returns
Boolean – otherwise.
- Return type
True if the data was retrieved successfully. An Exception
- get_update_job_results()¶
Obtain of the Update Job results from the Factiva Snapshots API.
Obtains the Update Job results from the Factiva Snapshots API. Results are stored in the last_update_job class property.
- Raises
- RuntimeError when an update job has not beed submitted. –
- Returns
Boolean – otherwise.
- Return type
True if the data was retrieved successfully. An Exception
- last_analytics_job = None¶
- last_explain_job = None¶
- last_extraction_job = None¶
- last_update_job = None¶
- news_data = None¶
- process_analytics()¶
Submit an Analytics job to the Factiva Snapshots API.
Submits an Analytics job to the Factiva Snapshots API, using the same parameters used by submit_analytics_job. Then, monitors the job until its status change to JOB_STATE_DONE. Finally, retrieves and stores the results in the property last_analytics_job.
- Returns
Boolean – otherwise.
- Return type
True if the analytics processing was successful. An Exception
Examples
- Process analytics job
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'" >>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause) >>> my_snapshot.process_analytics() >>> print(my_snapshot.last_analytics_job.data) publication_datetime count 0 2018-01 950516 1 2018-02 929795 2 2018-03 998663 3 2018-04 935845 4 2018-05 894903 5 2018-06 876938 6 2018-07 867509 7 2018-08 793283 8 2018-09 858963 9 2018-10 957739 10 2018-11 917355 11 2018-12 38401
- process_explain()¶
Submit an Explain job to the Factiva Snapshots API.
Submits an Explain job to the Factiva Snapshots API, using the same parameters used by submit_explain_job. Then, monitors the job until its status change to JOB_STATE_DONE. Finally, retrieves and stores the results in the property last_explain_job.
- Returns
Boolean – otherwise.
- Return type
True if the explain processing was successful. An Exception
Examples
- Process explain job from snapshot
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'" >>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause) >>> try: ... my_snapshot.process_explain() >>> except RuntimeError: ... print('There was an error with the API call') >>> >>> print(my_snapshot.last_explain_job.document_volume) 450483
- process_extraction(download_path=None)¶
Submit an Extraction job to the Factiva Snapshots API.
Submits an Extraction job to the Factiva Snapshots API, using the same parameters used by submit_extraction_job. Then, monitors the job until its status change to JOB_STATE_DONE. The final status is retrieved and stored in the property last_extraction_job, which among other properties, contains the list of files to download. The process then downloads all files to the specified download_path. If no download path is provided, files are stored in a folder named equal to the snapshot_id property. The process ends after all files are downloaded.
Because the whole processing takes places in a single call, it’s expected that the execution of this operation takes several minutes, or even hours.
- Parameters
download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.
- Returns
Boolean – otherwise.
- Return type
True if the extraction processing was successful. An Exception
Examples
- Process extraction job.
>>> query_clause = "publication_datetime >= '2018-01-01 00:00:00' AND publication_datetime <= '2018-01-02 00:00:00' AND LOWER(language_code) = 'en'" >>> my_snapshot = Snapshot(user_key='abcd1234abcd1234abcd1234abcd1234', query=query_clause) >>> my_snapshot.process_extraction(path='../downloads/data')
- process_update(update_type, download_path=None)¶
Submit an Update job to the Factiva Snapshots API.
Submits an Update job to the Factiva Snapshots API using the same parameters used by submit_update_job. Then, monitors the job until its status change to JOB_STATE_DONE. The final status is retrieved and stored in the property last_update_job, which among other properties, contains the list of files to download. The process then downloads all files to the specified download_path. If no download path is provided, files are stored in a folder named equal to the last_update_job.job_id property.
Because the whole processing takes places in a single call, it’s expected that the execution of this operation takes several minutes, or even hours.
- Parameters
update_type (str) – String containing the update type to submit a job. Could be ‘additions’, ‘replacements’ or ‘deletes’.
download_path (str, optional) – String containing the file path on where to store the files. If not provided, files are stored in a folder with the same name as the update ID.
- Returns
Boolean – otherwise.
- Return type
True if the update processing was successful. An Exception
Examples
- Process update job with type ‘additions’
>>> previous_snapshot = Snapshot(user_key=my_user, snapshot_id='sdjjekl93j') >>> previous_snapshot.process_update('additions', download_path=f'./{previous_snapshot.snapshot_id}/additions/')
- query = None¶
- submit_analytics_job()¶
Submit an Analytics job to the Factiva Snapshots API.
Submits an Analytics job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.
- Returns
Boolean
- Return type
True if the submission was successful. An Exception otherwise.
- submit_explain_job()¶
Submit an Explain job to the Factiva Snapshots API.
Submits an Explain job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.
- Returns
Boolean
- Return type
True if the submission was successful. An Exception otherwise.
- submit_extraction_job()¶
Submit an Extraction job to the Factiva Snapshots API.
Submits an Extraction job to the Factiva Snapshots API, using the assigned user in the user_key, and SnapshotQuery in the query properties.
- Returns
Boolean
- Return type
True if the submission was successful. An Exception otherwise.
- submit_update_job(update_type)¶
Submit an Update Job to the Factiva Snapshots API.
Submits an Update Job to the Factiva Snapshots API, using the assigned user in the user_key and snapshot_id asigned to the instance and the update_type passed as parameter. Assigns the submitted job to the last_update_job property.
- Parameters
update_type (str) – String containing the update type to submit a job. Could be ‘additions’, ‘replacements’ or ‘deletes’.
- Returns
Boolean
- Return type
True if the submission was successful. An Exception otherwise.
Snapshot Jobs¶
- class factiva.news.snapshot.ExplainJob(user_key)¶
Bases:
BulkNewsJob
Represent the operation of creating an explain from Factiva Snapshots API.
- document_volume = 0¶
- extraction_type = 'documents'¶
- get_endpoint_url()¶
Get endpoint URL.
- get_job_id(source)¶
Get job ID.
- set_job_data(source)¶
Set job data.
- class factiva.news.snapshot.AnalyticsJob(user_key)¶
Bases:
BulkNewsJob
Represent the operation of creating Analtyics from Factiva Snapshots API.
- data = []¶
- get_endpoint_url()¶
Get endpoint URL.
- get_job_id(source)¶
Get job ID.
- set_job_data(source)¶
Sets job data.
- class factiva.news.snapshot.ExtractionJob(snapshot_id=None, user_key=None)¶
Bases:
BulkNewsJob
Class that represents the operation of creating a Snapshot from Factiva Snapshots API.
- file_format = ''¶
- files = []¶
- get_endpoint_url()¶
Obtain endpoint URL.
- get_job_id(source)¶
Obtain Job ID.
- process_job(payload=None, path=None)¶
Override method from parent class to call the method for downloading the files once the snapshot has been completed.
Overrides method from parent class to call the method for downloading the files once the snapshot has been completed.
- Parameters
payload (str, Optional) – String containg the snapshot instance.
path (str, Optional) – String containg the path where to store the snapshots files that are downloaded from the snapshot. If no path is given, the files will be stored in a folder named after the snapshot_id in the current working directory.
- set_job_data(source)¶
Set job data.
- class factiva.news.snapshot.UpdateJob(update_type=None, snapshot_id=None, update_id=None, user_key=None)¶
Bases:
ExtractionJob
Represent the Snapshot Updates.
Class that represents the Snapshot Updates. There can be three types of updates: additions, replacements and deletes.
- Parameters
update_type (str, Optional) – String describing the type of update that this job represents. Requires snapshot_id to be provided as well. Not compatible with update_id
snapshot_id (str, Optional) – String containing the id of the snapshot that is being updated. Requires update_type to be provided as well. Not compatible with update_id
update_id (str, Optional) – String containing the id of an update job that has been created previously. Both update_type and snapshot_id can be obtained from this value. Not compatible with update_type nor snapshot_id
- Raises
- Exception when fields that are not compatible are provided or when not enough parameters are provided to create the job. –
- get_endpoint_url()¶
Get endpoint URL.
- get_job_id(source)¶
Get job ID from source.
- snapshot_id = None¶
- update_type = None¶
Snapshot Query¶
- class factiva.news.snapshot.SnapshotQuery(where, includes=None, excludes=None, select_fields=None, limit=0, file_format='avro', frequency='MONTH', date_field='publication_datetime', group_by_source_code=None, group_dimensions=None, top=10)¶
Bases:
BulkNewsQuery
Implement Snapshot Query class definition.
- date_field = ''¶
- file_format = ''¶
- frequency = ''¶
- get_analytics_query()¶
Obtain analytics Query.
- get_explain_query()¶
Obtain Base Query.
- get_extraction_query()¶
Obtain the string querying Factiva.
- group_by_source_code = None¶
- group_dimensions = None¶
- limit = 0¶
- top = 0¶